Edited, memorised or added to reading queue

on 08-Jan-2020 (Wed)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

The epoch is bracketed by two major events in Earth's history.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Paleocene - Wikipedia
n the modern Cenozoic Era . The name is a combination of the Ancient Greek palæo- meaning "old" and the Eocene Epoch (which succeeds the Paleocene), translating to "the old part of the Eocene". <span>The epoch is bracketed by two major events in Earth's history. The K-Pg extinction event , brought on by an asteroid impact and volcanism, marked the beginning of the Paleocene and killed off 75% of living species, most famously the non-avian dinos




#MLBook
Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #name-origin
So why the name “machine learning” then? The reason, as is often the case, is marketing: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term in 1959 while at IBM. Similarly to how in the 2010s IBM tried to market the term “cognitive computing” to stand out from competition, in the 1960s, IBM used the new cool term “machine learning” to attract both clients and talented employees.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #definition #machine-learning
machine learning is a universally recognized term that usually refers to the science and engineering of building machines capable of doing various useful things without being explicitly programmed to do so.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #brainstorming #machine-learning
The book also comes in handy when brainstorming at the beginning of a project, when you try to answer the question whether a given technical or business problem is “machine-learnable” and, if yes, which techniques you should try to solve it.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #data-origin #machine-learning
Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from nature, be handcrafted by humans or generated by another algorithm.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #types
Learning can be supervised, semi-supervised, unsupervised and reinforcement.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #classes #dataset #feature-vector #label #labeled-examples #machine-learning #supervised-learning
In supervised learning , the dataset is the collection of labeled examples \({(\mathbf x_i , y_i)}^N_{i=1}\) . Each element \(\mathbf x_i\) i among \(N\) is called a feature vector . A feature vector is a vector in which each dimension \(j = 1 , . . . , D\) contains a value that describes the example somehow. That value is called a feature and is denoted as \(x^{(j)}\) . For instance, if each example \(\mathbf x\) in our collection represents a person, then the first feature, \(x^{(1)}\) , could contain height in cm, the second feature, \(x^{(2)}\) , could contain weight in kg, \(x^{(3)}\) could contain gender, and so on. For all examples in the dataset, the feature at position \(j\) in the feature vector always contains the same kind of information. It means that if \(x^{(2)}_i\) contains weight in kg in some example \(\mathbf x_i\) , then \(x^{(2)}_k\) will also contain weight in kg in every example \(\mathbf x_k , k = 1 , . . . , N\) . The label \(y_i\) can be either an element belonging to a finite set of classes \(\{1 , 2 , . . . , C\}\) , or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph. Unless otherwise stated, in this book \(y_i\) is either one of a finite set of classes or a real number . You can see a class as a category to which an example belongs. For instance, if your examples are email messages and your problem is spam detection, then you have two classes \(\{spam, not\_spam\}\).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4763841596684

Question
An airway is present if the patient is conscious and speaking in a normal tone of voice.
Answer
A n ai r w ay i s p r e se n t i f th e pat i en t i s con s ci ou s a n d s peak i n g i n a n or m a l t on e of voi ce.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4763842645260

Question
[default - edit me]
Answer
An airway is present

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4763845528844

Question
What happens when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles?
Answer
We shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles, we shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

Original toplevel document (pdf)

cannot see any pdfs







#MLBook #goal #model #supervised-learning
The goal of a supervised learning algorithm is to use the dataset to produce a model that takes a feature vector \(\mathbf x\) as input and outputs information that allows deducing the label for this feature vector. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning
In unsupervised learning, the dataset is a collection of unlabeled examples \(\{\mathbf x_i\}^N_{i=1}\). Again, \(\mathbf x\) is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector \(\mathbf x\) as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input \(\mathbf x\); in outlier detection, the output is a real number that indicates how \(\mathbf x\) is different from a “typical” example in the dataset.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #semi-supervised-learning

In semi-supervised learning, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a semi-supervised learning algorithm is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model.

It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #actions #expected-average-reward #policy #reinforcement-learning #rewards #state

Reinforcement learning is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the state of that environment as a vector of features. The machine can execute actions in every state. Different actions bring different rewards and could also move the machine to another state of the environment. The goal of a reinforcement learning algorithm is to learn a policy.

A policy is a function (similar to the model in supervised learning) that takes the feature vector of a state as input and outputs an optimal action to execute in that state. The action is optimal if it maximizes the expected average reward.

Reinforcement learning solves a particular kind of problem where decision making is sequential, and the goal is long-term, such as game playing, robotics, resource management, or logistics.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #inputs #machine-learning #outputs #supervised-learning
The supervised learning process starts with gathering the data. The data for supervised learning is a collection of pairs (input, output). Input could be anything, for example, email messages, pictures, or sensor measurements. Outputs are usually real numbers, or labels (e.g. “spam”, “not_spam”, “cat”, “dog”, “mouse”, etc). In some cases, outputs are vectors (e.g., four coordinates of the rectangle around a person on the picture), sequences (e.g. [“adjective”, “adjective”, “noun”] for the input “big beautiful car”), or have some other structure.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #decision-boundary
In machine learning, the boundary separating the examples of different classes is called the decision boundary.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#distance #line #point

In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]: p.14

\(\operatorname{distance}(ax+by+c=0, (x_0, y_0)) = \frac{|ax_0+by_0+c|}{\sqrt{a^2+b^2}}. \)

The point on this line which is closest to (x0,y0) has coordinates:[3]

\(x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.\)

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Distance from a point to a line - Wikipedia
ction proof 3 Another formula 4 Vector formulation 5 Another vector formulation 6 See also 7 Notes 8 References 9 Further reading Cartesian coordinates[edit ] Line defined by an equation[edit ] <span>In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]:p.14 distance ⁡ ( a x + b y + c = 0 , ( x 0 , y 0 ) ) = | a x 0 + b y 0 + c | a 2 + b 2 . {\displaystyle \operatorname {distance} (ax+by+c=0,(x_{0},y_{0}))={\frac {|ax_{0}+by_{0}+c|}{\sqrt {a^{2}+b^{2}}}}.} The point on this line which is closest to (x0,y0) has coordinates:[3] x = b ( b x 0 − a y 0 ) − a c a 2 + b 2 and y = a ( − b x 0 + a y 0 ) − b c a 2 + b 2 . {\displaystyle x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.} Horizontal and vertical lines In the general equation of a line, ax + by + c = 0, a and b cannot both be zero unless c is also zero, in which case the equation does not define a line. I




#distance #straight-lines

Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines

\(y=mx+b_{1}\,\) \(y=mx+b_{2}\,,\)

the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line

\({\displaystyle y=-x/m\,.}\)

This distance can be found by first solving the linear systems

\({\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}\)

and

\({\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}\)

to get the coordinates of the intersection points. The solutions to the linear systems are the points

\(\left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,\)

and

\(\left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.\)

The distance between the points is

\(d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,\)

which reduces to

\(d={\frac {|b_{2}-b_{1}|}{{\sqrt {m^{2}+1}}}}\,.\)

When the lines are given by

\(ax+by+c_{1}=0\,\) \(ax+by+c_{2}=0,\,\)

the distance between them can be expressed as

\(d={\frac {|c_{2}-c_{1}|}{{\sqrt {a^{2}+b^{2}}}}}.\)

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Distance between two straight lines - Wikipedia
el lines, the distance is the perpendicular distance from any point on one line to the other line. Contents 1 Formula and proof 2 See also 3 References 4 External links Formula and proof[edit ] <span>Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines y = m x + b 1 {\displaystyle y=mx+b_{1}\,} y = m x + b 2 , {\displaystyle y=mx+b_{2}\,,} the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line y = − x / m . {\displaystyle y=-x/m\,.} This distance can be found by first solving the linear systems { y = m x + b 1 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}} and { y = m x + b 2 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}} to get the coordinates of the intersection points. The solutions to the linear systems are the points ( x 1 , y 1 ) = ( − b 1 m m 2 + 1 , b 1 m 2 + 1 ) , {\displaystyle \left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,} and ( x 2 , y 2 ) = ( − b 2 m m 2 + 1 , b 2 m 2 + 1 ) . {\displaystyle \left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.} The distance between the points is d = ( b 1 m − b 2 m m 2 + 1 ) 2 + ( b 2 − b 1 m 2 + 1 ) 2 , {\displaystyle d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,} which reduces to d = | b 2 − b 1 | m 2 + 1 . {\displaystyle d={\frac {|b_{2}-b_{1}|}{\sqrt {m^{2}+1}}}\,.} When the lines are given by a x + b y + c 1 = 0 {\displaystyle ax+by+c_{1}=0\,} a x + b y + c 2 = 0 , {\displaystyle ax+by+c_{2}=0,\,} the distance between them can be expressed as d = | c 2 − c 1 | a 2 + b 2 . {\displaystyle d={\frac {|c_{2}-c_{1}|}{\sqrt {a^{2}+b^{2}}}}.} See also[edit ] Distance from a point to a line Skew lines#Distance References[edit ] Abstand In: Schülerduden – Mathematik II. Bibliographisches Institut & F. A. Brockhaus, 2004, I




#MLBook #accuracy #classification-learning-algorithm #decision-boundary
Any classification learning algorithm that builds a model implicitly or explicitly creates a decision boundary. The decision boundary can be straight, or curved, or it can have a complex form, or it can be a superposition of some geometrical figures. The form of the decision boundary determines the accuracy of the model (that is the ratio of examples whose labels are predicted correctly). The form of the decision boundary, the way it is algorithmically or mathematically computed based on the training data, differentiates one learning algorithm from another.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #learning-algorithms #machine-learning #prediction-processing-time #speed-of-model-building
In practice, there are two other essential differentiators of learning algorithms to consider: speed of model building and prediction processing time. In many practical cases, you would prefer a learning algorithm that builds a less accurate model fast. Additionally, you might prefer a less accurate model that is much quicker at making predictions.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 4763872791820] #MLBook #error #has-images #machine-learning #prediction #probability

Why is a machine-learned model capable of predicting correctly the labels of new, previously unseen examples? To understand that, look at the plot in Figure 1. If two classes are separable from one another by a decision boundary, then, obviously, examples that belong to each class are located in two different subspaces which the decision boundary creates.

If the examples used for training were selected randomly, independently of one another, and following the same procedure, then, statistically, it is more likely that the new negative example will be located on the plot somewhere not too far from other negative examples. The same concerns the new positive example: it will likely come from the surroundings of other positive examples. In such a case, our decision boundary will still, with high probability, separate well new positive and negative examples from one another. For other, less likely situations, our model will make errors, but because such situations are less likely, the number of errors will likely be smaller than the number of correct predictions.

Intuitively, the larger is the set of training examples, the more unlikely that the new examples will be dissimilar to (and lie on the plot far from) the examples used for training.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #set
A set is an unordered collection of unique elements.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #cardinality-operator #machine-learning
The cardinality operator \(\left\vert \mathcal S \right\vert\) returns the number of elements in set \(\mathcal S\).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #codomain #domain #function #machine-learning
A function is a relation that associates each element \(x\) of a set \(\mathcal X\) , the domain of the function, to a single element \(y\) of another set \(\mathcal Y\) , the codomain of the function.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 4763906608396] #MLBook #has-images #local-minimum #machine-learning
We say that \(f(x)\) has a local minimum at \(x = c\) if \(f(x) \ge f(c)\) for every \(x\) in some open interval around \(x = c\).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #vector-function
A vector function, denoted as \(\mathbf y = \mathbf f(x)\) is a function that returns a vector \(\mathbf y\) . It can have a vector or a scalar argument.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #arg-max #machine-learning #max
Given a set of values \(\mathcal A = \{a_1, a_2, \ldots , a_n \}\), the operator \(\max_{a \in A} f(a)\) returns the highest value \(f(a)\) for all elements in the set \(\mathcal A\) . On the other hand, the operator \(\arg \max_{a \in A} f(a)\) returns the element of the set \(\mathcal A\) that maximizes \(f(a)\). Sometimes, when the set is implicit or infinite, we can write \(\max_a f(a)\) or \(\arg \max_a f(a)\).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #gradient #machine-learning #partial-derivatives
Gradient is the generalization of derivative for functions that take several inputs (or one input in the form of a vector or some other complex structure). A gradient of a function is a vector of partial derivatives. You can look at finding a partial derivative of a function as the process of finding the derivative by focusing on one of the function’s inputs and by considering all other inputs as constant values.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #gradient #machine-learning
The gradient of function\(f \left( \left[ x^{(1)}, x^{(2)} \right] \right)\), denoted as \(\nabla f\), is given by the vector \(\left[ \frac{\partial f}{\partial x^{(1)}}, \frac{\partial f}{\partial x^{(2)}} \right]\).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #random-variable

A random variable, usually written as an italic capital letter, like \(X\) , is a variable whose possible values are numerical outcomes of a random phenomenon.

Remark: In the following, the author gives an example in which red, yellow, and blue are possible values. So, the outcomes are not necessarily numbers.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 4763923909900] #MLBook #has-images #machine-learning #pmf #probability-distribution #probability-mass-function
The probability distribution of a discrete random variable is described by a list of probabilities associated with each of its possible values. This list of probabilities is called a probability mass function (pmf). For example: \(\operatorname{Pr}(X=red) = 0.3\), \(\operatorname{Pr}(X=yellow) = 0.45\), \(\operatorname{Pr}(X=blue) = 0.25\). Each probability in a probability mass function is a value greater than or equal to 0. The sum of probabilities equals 1 (Figure 3a).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 4763923909900] #MLBook #has-images #machine-learning #pdf #probability-density-function
Because the number of values of a continuous random variable \(X\) is infinite, the probability \(\operatorname{Pr}(X=c)\) for any \(c\) is 0. Therefore, instead of the list of probabilities, the probability distribution of a CRV (a continuous probability distribution) is described by a probability density function (pdf). The pdf is a function whose codomain is nonnegative and the area under the curve is equal to 1 (Figure 3b).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #expectation #expected-value #machine-learning #statistics

Let a discrete random variable \(X\) have \(k\) possible values \(\{ x_i \}_{i=1}^k\). The expectation of \(X\) denoted as \(\mathbb E[X]\) is given by,

\(\begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align}\)

where \(\textrm{Pr} \left( X = x_i \right)\) is the probability that \(X\) has the value \(x_i\) according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter \(\mu\) . The expectation is one of the most important statistics of a random variable.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #standard-deviation #variance

Another important statistic is the standard deviation, defined as,

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\mathbb E \left[ \left( X - \mu\right)^2 \right] }.\)

Variance, denoted as \(\sigma^2\) or \(var(X)\), is defined as,

\(\sigma^2 \stackrel{\textrm{def}}{=} \mathbb E \left[ \left( X - \mu\right)^2 \right].\)

For a discrete random variable, the standard deviation is given by:

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\textrm{Pr} \left( X = x_1 \right) \left( x_1 - \mu \right)^2 + \textrm{Pr} \left( X = x_2 \right) \left( x_2 - \mu \right)^2 + \cdots + \textrm{Pr} \left( X = x_k \right) \left( x_k - \mu \right)^2},\)

where \(\mu = \mathbb E \left[ X \right]\).

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #continuous-random-variable #expectation #machine-learning

The expectation of a continuous random variable \(X\) is given by,

\(\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,\)

where \(f_X\) is the pdf of the variable \(X\) and \(\int_{\mathbb R}\) is the integral of function \(x f_X\) .

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #dataset #examples #machine-learning #sample
Most of the time we don’t know \(f_X\) , but we can observe some values of \(X\). In machine learning, we call these values examples, and the collection of these examples is called a sample or a dataset.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #sample-statistic #unbiased-estimators

Because \(f_X\) is usually unknown, but we have a sample \(S_X = \{ x_i \}_{i=1}^N\) , we often content ourselves not with the true values of statistics of the probability distribution, such as expectation, but with their unbiased estimators.

We say that \(\hat{\theta} \left( S_X \right)\) is an unbiased estimator of some statistic \(\theta\) calculated using a sample \(S_X\) drawn from an unknown probability distribution if \(\hat{\theta} \left( S_X \right)\) has the following property:

\(\mathbb E \left[ \hat{\theta} \left( S_X \right) \right] = \theta,\)

where \(\hat{\theta}\) is a sample statistic, obtained using a sample \(S_X\) and not the real statistic \(\theta\) that can be obtained only knowing \(X\); the expectation is taken over all possible samples drawn from \(X\) . Intuitively, this means that if you can have an unlimited number of such samples as \(S_X\), and you compute some unbiased estimator, such as \(\hat{\mu}\) , using each sample, then the average of all these \(\hat{\mu}\) equals the real statistic \(\mu\) that you would get computed on \(X\).

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #sample-mean
It can be shown that an unbiased estimator of an unknown \(\mathbb E \left[ X \right]\)] (given by either eq. 1 or eq. 2) is given by \(\frac{1}{N} \sum_{i=1}^N x_i\) (called in statistics the sample mean).
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Bayes-rule #Bayes-theorem #MLBook #machine-learning

The conditional probability \(\textrm{Pr} \left( X=x \vert Y=y \right)\) is the probability of the random variable \(X\) to have a specific value \(x\) given that another random variable \(Y\) has a specific value of \(y\). The Bayes’ Rule (also known as the Bayes’ Theorem) stipulates that:

\(\textrm{Pr} \left( X=x \vert Y=y \right) = \displaystyle \frac{\textrm{Pr} \left( Y=y \vert X=x \right) \textrm{Pr} \left( X=x \right)}{\textrm{Pr} \left( Y=y \right)}\).

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #review
2.5 Parameter Estimation
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #hyperparameter #machine-learning
A hyperparameter is a property of a learning algorithm, usually (but not always) having a numerical value. That value influences the way the algorithm works. Hyperparameters aren’t learned by the algorithm itself from data. They have to be set by the data analyst before running the algorithm. I show how to do that in Chapter 5.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #parameters
Parameters are variables that define the model learned by the learning algorithm. Parameters are directly modified by the learning algorithm based on the training data. The goal of learning is to find such values of parameters that make the model optimal in a certain sense.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #classification #label #machine-learning #unlabeled-example
Classification is a problem of automatically assigning a label to an unlabeled example. Spam detection is a famous example of classification.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #classification-learning-algorithm #labeled-examples #machine-learning #model
In machine learning, the classification problem is solved by a classification learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and either directly output a label or output a number that can be used by the analyst to deduce the label. An example of such a number is a probability.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #binary-classification #binomial-classification #classes #machine-learning #multiclass-classification #multinomial-classification
In a classification problem, a label is a member of a finite set of classes. If the size of the set of classes is two (“sick”/“healthy”, “spam”/“not_spam”), we talk about binary classification (also called binomial in some sources). Multiclass classification (also called multinomial) is a classification problem with three or more classes.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #regression #target
Regression is a problem of predicting a real-valued label (often called a target) given an unlabeled example. Estimating house price valuation based on house features, such as area, the number of bedrooms, location and so on is a famous example of regression.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #regression-learning-algorithm
The regression problem is solved by a regression learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and output a target.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #machine-learning #model-based-learning #model-parameters
Most supervised learning algorithms are model-based. We have already seen one such algorithm: SVM. Model-based learning algorithms use the training data to create a model that has parameters learned from the training data. In SVM, the two parameters we saw were \(\mathbf w^\ast\) and \(b^\ast\) . After the model was built, the training data can be discarded.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #instance-based #k-nearest-neighbors #kNN #learning #machine-learning
Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is k-Nearest Neighbors (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#MLBook #deep-learning #deep-neural-networks #layer #machine-learning #neural-network #shallow-learning
A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Clinique #EBM #Médecine #Sémiologie
Sensitivity is the proportion of patients with the diagnosis who have the physical sign (i.e., have the positive result). Specificity is the proportion of patients without the diagnosis who lack the physical sign (i.e., have the negative result)
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




The two ovaries contain thousands of follicles, each with an oocyte surrounded by a layer of granulosa cells and thecal cells. These supporting cells produce steroids and paracrine products important in follicular maturation and the coordination of events in reproduction
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




Until week 8 of gestation, the sex of the embryo cannot be determined morphologically; therefore, this period is termed the indifferent phase of sexual development. After this time, differentiation of the internal and external genitalia occurs, determining the phenotypic sex of the individual, which becomes fully developed after puberty
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




After 8 weeks of gestation, the production of anti-müllerian hormone by Sertoli cells in the fetal testes leads to regression of the müllerian ducts, whereas production of testosterone by the Leydig cells leads to the persistence of the wolffian duct and the subsequent development of the prostate, epididymis, and seminal vesicles. In the absence of these secretions, female internal reproductive organs are formed from the müllerian ducts, and the wolffian structures degenerate.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
During female development, the female ovaries contain about 7 million oogonia by 24 weeks of gestation. The majority of these cells die during intrauterine life, leaving only about 1 million primary oocytes at birth. This decreases to about 400,000 by puberty
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
The surviving oogonia are arrested at the prophase of meiosis I. Completion of the first meiotic division does not occur until the time of ovulation, and the second meiosis is completed with fertilization. Only about 400 of these oocytes mature and are released by ovulation during a woman’s lifetime; the others undergo atresia at various stages of development
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
The changes that occur in the brain and hypothalamus that initiate the onset of puberty involve, first, the establishment of sleep-dependent and, later, the truly pulsatile release of gonadotropin-releasing hormone (GnRH) from the hypothalamus. The hypothalamic kisspeptin/GPR54 ligand/receptor pair appears to be the key mediator of the onset of puberty
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
Before about age 10 years in girls, gonadotropin secretion is at low levels and does not display a pulsatile character. After this age, the pulsatile release of GnRH begins and initiates folliculogenesis, leading to cyclic changes in estrogen and progesterone production. These changes allow estrogen-dependent tissues, such as the breasts and the endometrium, to begin their maturation. The appearance of breast development is referred to as thelarche, and the first menstrual period is termed menarche. CHECKPOINT 4.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
The menstrual cycle has three phases. The follicular phase typically lasts 12– 14 days and culminates in the production of a mature oocyte. Initially, a cohort of follicles begins to grow, but ultimately a single dominant follicle is selected, and the rest undergo a process of degeneration and apoptotic death, termed atresia (Figure 22–5). The follicular phase is followed by ovulation, in which the dominant follicle releases its mature oocyte to be transported through the uterine tubes for fertilization and subsequent implantation in a receptive uterus. The third, luteal, phase also averages 14 days and is characterized by luteinization of the ruptured follicle to produce the corpus luteum
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
Neurons within the hypothalamus synthesize the peptide GnRH, and its secretion is modulated by endogenous opioids and corticotropin-releasing hormone (CRH). GnRH is secreted directly into the portal circulation of the pituitary in a pulsatile fashion. This pulsatility is required for proper activation of its receptor located on the gonadotropes, which are cells located in the anterior pituitary. In response, the gonadotropes secrete the polypeptides FSH and LH, collectively called gonadotropins, which stimulate the ovary to produce estrogen and inhibin. Inhibin feeds back to suppress FSH secretion but has no effect on LH. Estrogen also affects the pituitary by increasing the number of GnRH receptors and its sensitivity to GnRH stimulation. With estradiol production by the ovaries, a critical concentration is reached for a sufficient time to induce a midcycle LH surge and subsequent ovulation. After this surge, high levels of progesterone produced by the corpus luteum suppress gonadotropin release for the duration of the luteal phase
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
Activin acts in the ovary to augment the effect of FSH, increasing aromatase activity and increasing the production of FSH and LH receptors
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
During the early follicular phase, FSH stimulates the growth of a cohort of follicles and increases the production of inhibin and activin in granulosa cells
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
LH stimulates the production of androgens in the thecal cells, which is augmented by inhibin. Androgens diffuse into the granulosa cells to be converted to estrogens through the enzymatic reaction of aromatization.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
The midcycle LH surge triggers the final steps of oocyte maturation and the resumption of meiosis within the dominant oocyte.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
Continued secretion from the corpus luteum requires LH (or human chorionic gonadotropin [hCG], as discussed below) stimulation; in its absence, degeneration occurs
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
During the follicular phase, the endometrium proliferates under the influence of estrogen, creating straight glands with thin secretions and microvascular proliferation. During the luteal phase, the high levels of estradiol and progesterone promote the maturation of the endometrium, which develops tortuous glands engorged with thick secretions and proteins (see Figure 22–2). Additionally, the endometrium secretes a number of endocrine and paracrine factors (Table 22–1). These changes optimize the environment for implantation. In the absence of pregnancy, the corpus luteum cannot sustain the high levels of progesterone production, and the endometrial vasculature cannot be maintained. This leads to a sloughing of the endometrium and the onset of menstruation, which is marked by the nadir of estradiol and progesterone levels, ending the cycle
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#Médecine #Pathophysiology-Of-Disease #Physiologie
Most preparations of estrogen and progestin block the LH surge at midcycle, thereby preventing ovulation. However, other contraceptive actions include effects on estrogen- and progesterone-sensitive tissues, such as inducing antifertility changes in cervical mucus and the endometrial lining that are unfavorable to sperm transport and embryonic implantation, respectively.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs