Do you want BuboFlash to help you learning these things? Click here to log in or create user.

The epoch is bracketed by two major events in Earth's history.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

n the modern Cenozoic Era . The name is a combination of the Ancient Greek palæo- meaning "old" and the Eocene Epoch (which succeeds the Paleocene), translating to "the old part of the Eocene". <span>The epoch is bracketed by two major events in Earth's history. The K-Pg extinction event , brought on by an asteroid impact and volcanism, marked the beginning of the Paleocene and killed off 75% of living species, most famously the non-avian dinos

#MLBook

Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #name-origin

So why the name “machine learning” then? The reason, as is often the case, is marketing: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term in 1959 while at IBM. Similarly to how in the 2010s IBM tried to market the term “cognitive computing” to stand out from competition, in the 1960s, IBM used the new cool term “machine learning” to attract both clients and talented employees.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #definition #machine-learning

machine learning is a universally recognized term that usually refers to the science and engineering of building machines capable of doing various useful things without being explicitly programmed to do so.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #brainstorming #machine-learning

The book also comes in handy when brainstorming at the beginning of a project, when you try to answer the question whether a given technical or business problem is “machine-learnable” and, if yes, which techniques you should try to solve it.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #data-origin #machine-learning

Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from nature, be handcrafted by humans or generated by another algorithm.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #types

Learning can be supervised, semi-supervised, unsupervised and reinforcement.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #classes #dataset #feature-vector #label #labeled-examples #machine-learning #supervised-learning

In **supervised learning** , the **dataset** is the collection of **labeled examples** \({(\mathbf x_i , y_i)}^N_{i=1}\) . Each element \(\mathbf x_i\) i among \(N\) is called a **feature vector** . A feature vector is a vector in which each dimension \(j = 1 , . . . , D\) contains a value that describes the example somehow. That value is called a **feature** and is denoted as \(x^{(j)}\) . For instance, if each example \(\mathbf x\) in our collection represents a person, then the first feature, \(x^{(1)}\) , could contain height in cm, the second feature, \(x^{(2)}\) , could contain weight in kg, \(x^{(3)}\) could contain gender, and so on. For all examples in the dataset, the feature at position \(j\) in the feature vector always contains the same kind of information. It means that if \(x^{(2)}_i\) contains weight in kg in some example \(\mathbf x_i\) , then \(x^{(2)}_k\) will also contain weight in kg in every example \(\mathbf x_k , k = 1 , . . . , N\) . The **label** \(y_i\) can be either an element belonging to a finite set of classes \(\{1 , 2 , . . . , C\}\) , or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph. Unless otherwise stated, in this book \(y_i\) is either one of a finite set of classes or a real number . You can see a class as a category to which an example belongs. For instance, if your examples are email messages and your problem is spam detection, then you have two classes \(\{spam, not\_spam\}\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

Question

An airway is present if the patient is conscious and speaking in a normal tone of voice.

Answer

A n ai r w ay i s p r e se n t i f th e pat i en t i s con s ci ou s a n d s peak i n g i n a n or m a l t on e of voi ce.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Question

[default - edit me]

Answer

An airway is present

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Question

What happens when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles?

Answer

We shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles, we shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

#MLBook #goal #model #supervised-learning

The goal of a **supervised learning algorithm** is to use the dataset to produce a **model** that takes a feature vector \(\mathbf x\) as input and outputs information that allows deducing the label for this feature vector. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning

In **unsupervised learning**, the dataset is a collection of **unlabeled examples** \(\{\mathbf x_i\}^N_{i=1}\). Again, \(\mathbf x\) is a feature vector, and the goal of an **unsupervised learning algorithm** is to create a **model** that takes a feature vector \(\mathbf x\) as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in **clustering** , the model returns the id of the cluster for each feature vector in the dataset. In **dimensionality reduction**, the output of the model is a feature vector that has fewer features than the input \(\mathbf x\); in **outlier detection**, the output is a real number that indicates how \(\mathbf x\) is different from a “typical” example in the dataset.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #semi-supervised-learning

In **semi-supervised learning**, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a **semi-supervised learning algorithm** is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model.

It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #actions #expected-average-reward #policy #reinforcement-learning #rewards #state

**Reinforcement learning** is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the **state** of that environment as a vector of features. The machine can execute **actions** in every state. Different actions bring different **rewards** and could also move the machine to another state of the environment. The goal of a reinforcement learning algorithm is to learn a **policy**.

A policy is a function (similar to the model in supervised learning) that takes the feature vector of a state as input and outputs an optimal action to execute in that state. The action is optimal if it maximizes the **expected average reward**.

Reinforcement learning solves a particular kind of problem where decision making is sequential, and the goal is long-term, such as game playing, robotics, resource management, or logistics.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #inputs #machine-learning #outputs #supervised-learning

The supervised learning process starts with gathering the data. The data for supervised learning is a collection of pairs (input, output). Input could be anything, for example, email messages, pictures, or sensor measurements. Outputs are usually real numbers, or labels (e.g. “spam”, “not_spam”, “cat”, “dog”, “mouse”, etc). In some cases, outputs are vectors (e.g., four coordinates of the rectangle around a person on the picture), sequences (e.g. [“adjective”, “adjective”, “noun”] for the input “big beautiful car”), or have some other structure.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #decision-boundary

In machine learning, the boundary separating the examples of different classes is called the **decision boundary**.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#distance #line #point

In the case of a line in the plane given by the equation *ax* + *by* + *c* = 0, where *a*, *b* and *c* are real constants with *a* and *b* not both zero, the distance from the line to a point (*x*_{0},*y*_{0}) is^{[1]}^{[2]}^{: p.14 }

\(\operatorname{distance}(ax+by+c=0, (x_0, y_0)) = \frac{|ax_0+by_0+c|}{\sqrt{a^2+b^2}}. \)

The point on this line which is closest to (*x*_{0},*y*_{0}) has coordinates:^{[3]}

\(x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.\)

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

ction proof 3 Another formula 4 Vector formulation 5 Another vector formulation 6 See also 7 Notes 8 References 9 Further reading Cartesian coordinates[edit ] Line defined by an equation[edit ] <span>In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]:p.14 distance ( a x + b y + c = 0 , ( x 0 , y 0 ) ) = | a x 0 + b y 0 + c | a 2 + b 2 . {\displaystyle \operatorname {distance} (ax+by+c=0,(x_{0},y_{0}))={\frac {|ax_{0}+by_{0}+c|}{\sqrt {a^{2}+b^{2}}}}.} The point on this line which is closest to (x0,y0) has coordinates:[3] x = b ( b x 0 − a y 0 ) − a c a 2 + b 2 and y = a ( − b x 0 + a y 0 ) − b c a 2 + b 2 . {\displaystyle x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.} Horizontal and vertical lines In the general equation of a line, ax + by + c = 0, a and b cannot both be zero unless c is also zero, in which case the equation does not define a line. I

#distance #straight-lines

Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines

\(y=mx+b_{1}\,\) \(y=mx+b_{2}\,,\)

the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line

\({\displaystyle y=-x/m\,.}\)

This distance can be found by first solving the linear systems

\({\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}\)

and

\({\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}\)

to get the coordinates of the intersection points. The solutions to the linear systems are the points

\(\left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,\)

and

\(\left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.\)

The distance between the points is

\(d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,\)

which reduces to

\(d={\frac {|b_{2}-b_{1}|}{{\sqrt {m^{2}+1}}}}\,.\)

When the lines are given by

\(ax+by+c_{1}=0\,\) \(ax+by+c_{2}=0,\,\)

the distance between them can be expressed as

\(d={\frac {|c_{2}-c_{1}|}{{\sqrt {a^{2}+b^{2}}}}}.\)

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

el lines, the distance is the perpendicular distance from any point on one line to the other line. Contents 1 Formula and proof 2 See also 3 References 4 External links Formula and proof[edit ] <span>Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines y = m x + b 1 {\displaystyle y=mx+b_{1}\,} y = m x + b 2 , {\displaystyle y=mx+b_{2}\,,} the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line y = − x / m . {\displaystyle y=-x/m\,.} This distance can be found by first solving the linear systems { y = m x + b 1 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}} and { y = m x + b 2 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}} to get the coordinates of the intersection points. The solutions to the linear systems are the points ( x 1 , y 1 ) = ( − b 1 m m 2 + 1 , b 1 m 2 + 1 ) , {\displaystyle \left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,} and ( x 2 , y 2 ) = ( − b 2 m m 2 + 1 , b 2 m 2 + 1 ) . {\displaystyle \left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.} The distance between the points is d = ( b 1 m − b 2 m m 2 + 1 ) 2 + ( b 2 − b 1 m 2 + 1 ) 2 , {\displaystyle d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,} which reduces to d = | b 2 − b 1 | m 2 + 1 . {\displaystyle d={\frac {|b_{2}-b_{1}|}{\sqrt {m^{2}+1}}}\,.} When the lines are given by a x + b y + c 1 = 0 {\displaystyle ax+by+c_{1}=0\,} a x + b y + c 2 = 0 , {\displaystyle ax+by+c_{2}=0,\,} the distance between them can be expressed as d = | c 2 − c 1 | a 2 + b 2 . {\displaystyle d={\frac {|c_{2}-c_{1}|}{\sqrt {a^{2}+b^{2}}}}.} See also[edit ] Distance from a point to a line Skew lines#Distance References[edit ] Abstand In: Schülerduden – Mathematik II. Bibliographisches Institut & F. A. Brockhaus, 2004, I

#MLBook #accuracy #classification-learning-algorithm #decision-boundary

Any classification learning algorithm that builds a model implicitly or explicitly creates a decision boundary. The decision boundary can be straight, or curved, or it can have a complex form, or it can be a superposition of some geometrical figures. The form of the decision boundary determines the accuracy of the model (that is the ratio of examples whose labels are predicted correctly). The form of the decision boundary, the way it is algorithmically or mathematically computed based on the training data, differentiates one learning algorithm from another.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #learning-algorithms #machine-learning #prediction-processing-time #speed-of-model-building

In practice, there are two other essential differentiators of learning algorithms to consider: speed of model building and prediction processing time. In many practical cases, you would prefer a learning algorithm that builds a less accurate model fast. Additionally, you might prefer a less accurate model that is much quicker at making predictions.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

[unknown IMAGE 4763872791820]

#MLBook #error #has-images #machine-learning #prediction #probability

Why is a machine-learned model capable of predicting correctly the labels of new, previously unseen examples? To understand that, look at the plot in Figure 1. If two classes are separable from one another by a decision boundary, then, obviously, examples that belong to each class are located in two different subspaces which the decision boundary creates.

If the examples used for training were selected randomly, independently of one another, and following the same procedure, then, statistically, it is *more likely* that the new negative example will be located on the plot somewhere not too far from other negative examples. The same concerns the new positive example: it will *likely* come from the surroundings of other positive examples. In such a case, our decision boundary will still, with *high probability*, separate well new positive and negative examples from one another. For other, *less likely situations*, our model will make errors, but because such situations are less likely, the number of errors will likely be smaller than the number of correct predictions.

Intuitively, the larger is the set of training examples, the more unlikely that the new examples will be dissimilar to (and lie on the plot far from) the examples used for training.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #set

A set is an unordered collection of unique elements.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #cardinality-operator #machine-learning

The cardinality operator \(\left\vert \mathcal S \right\vert\) returns the number of elements in set \(\mathcal S\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #codomain #domain #function #machine-learning

A function is a relation that associates each element \(x\) of a set \(\mathcal X\) , the *domain* of the function, to a single element \(y\) of another set \(\mathcal Y\) , the *codomain* of the function.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

[unknown IMAGE 4763906608396]

#MLBook #has-images #local-minimum #machine-learning

We say that \(f(x)\) has a *local minimum* at \(x = c\) if \(f(x) \ge f(c)\) for every \(x\) in some open interval around \(x = c\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #vector-function

A vector function, denoted as \(\mathbf y = \mathbf f(x)\) is a function that returns a vector \(\mathbf y\) . It can have a vector or a scalar argument.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #arg-max #machine-learning #max

Given a set of values \(\mathcal A = \{a_1, a_2, \ldots , a_n \}\), the operator \(\max_{a \in A} f(a)\) returns the highest value \(f(a)\) for all elements in the set \(\mathcal A\) . On the other hand, the operator \(\arg \max_{a \in A} f(a)\) returns the element of the set \(\mathcal A\) that maximizes \(f(a)\). Sometimes, when the set is implicit or infinite, we can write \(\max_a f(a)\) or \(\arg \max_a f(a)\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #gradient #machine-learning #partial-derivatives

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #gradient #machine-learning

The gradient of function\(f \left( \left[ x^{(1)}, x^{(2)} \right] \right)\), denoted as \(\nabla f\), is given by the vector \(\left[ \frac{\partial f}{\partial x^{(1)}}, \frac{\partial f}{\partial x^{(2)}} \right]\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #random-variable

A *random variable*, usually written as an italic capital letter, like \(X\) , is a variable whose possible values are numerical outcomes of a random phenomenon.

*Remark: In the following, the author gives an example in which red, yellow, and blue are possible values. So, the outcomes are not necessarily numbers.*

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

[unknown IMAGE 4763923909900]

#MLBook #has-images #machine-learning #pmf #probability-distribution #probability-mass-function

The *probability distribution* of a discrete random variable is described by a list of probabilities associated with each of its possible values. This list of probabilities is called a *probability mass function* (pmf). For example: \(\operatorname{Pr}(X=red) = 0.3\), \(\operatorname{Pr}(X=yellow) = 0.45\), \(\operatorname{Pr}(X=blue) = 0.25\). Each probability in a probability mass function is a value greater than or equal to 0. The sum of probabilities equals 1 (Figure 3a).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

[unknown IMAGE 4763923909900]

#MLBook #has-images #machine-learning #pdf #probability-density-function

Because the number of values of a continuous random variable \(X\) is infinite, the probability \(\operatorname{Pr}(X=c)\) for any \(c\) is 0. Therefore, instead of the list of probabilities, the probability distribution of a CRV (a continuous probability distribution) is described by a *probability density function* (pdf). The pdf is a function whose codomain is nonnegative and the area under the curve is equal to 1 (Figure 3b).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #expectation #expected-value #machine-learning #statistics

Let a discrete random variable \(X\) have \(k\) possible values \(\{ x_i \}_{i=1}^k\). The *expectation* of \(X\) denoted as \(\mathbb E[X]\) is given by,

\(\begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align}\)

where \(\textrm{Pr} \left( X = x_i \right)\) is the probability that \(X\) has the value \(x_i\) according to the pmf. The expectation of a random variable is also called the *mean*, *average* or *expected value* and is frequently denoted with the letter \(\mu\) . The expectation is one of the most important *statistics* of a random variable.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #standard-deviation #variance

Another important statistic is the *standard deviation*, defined as,

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\mathbb E \left[ \left( X - \mu\right)^2 \right] }.\)

*Variance*, denoted as \(\sigma^2\) or \(var(X)\), is defined as,

\(\sigma^2 \stackrel{\textrm{def}}{=} \mathbb E \left[ \left( X - \mu\right)^2 \right].\)

For a discrete random variable, the standard deviation is given by:

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\textrm{Pr} \left( X = x_1 \right) \left( x_1 - \mu \right)^2 + \textrm{Pr} \left( X = x_2 \right) \left( x_2 - \mu \right)^2 + \cdots + \textrm{Pr} \left( X = x_k \right) \left( x_k - \mu \right)^2},\)

where \(\mu = \mathbb E \left[ X \right]\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #continuous-random-variable #expectation #machine-learning

The expectation of a continuous random variable \(X\) is given by,

\(\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,\)

where \(f_X\) is the pdf of the variable \(X\) and \(\int_{\mathbb R}\) is the integral of function \(x f_X\) .

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #dataset #examples #machine-learning #sample

Most of the time we don’t know \(f_X\) , but we can observe some values of \(X\). In machine learning, we call these values **examples**, and the collection of these examples is called a **sample** or a **dataset**.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #sample-statistic #unbiased-estimators

Because \(f_X\) is usually unknown, but we have a sample \(S_X = \{ x_i \}_{i=1}^N\) , we often content ourselves not with the true values of statistics of the probability distribution, such as expectation, but with their *unbiased estimators*.

We say that \(\hat{\theta} \left( S_X \right)\) is an unbiased estimator of some statistic \(\theta\) calculated using a sample \(S_X\) drawn from an unknown probability distribution if \(\hat{\theta} \left( S_X \right)\) has the following property:

\(\mathbb E \left[ \hat{\theta} \left( S_X \right) \right] = \theta,\)

where \(\hat{\theta}\) is a *sample statistic*, obtained using a sample \(S_X\) and not the real statistic \(\theta\) that can be obtained only knowing \(X\); the expectation is taken over all possible samples drawn from \(X\) . Intuitively, this means that if you can have an unlimited number of such samples as \(S_X\), and you compute some unbiased estimator, such as \(\hat{\mu}\) , using each sample, then the average of all these \(\hat{\mu}\) equals the real statistic \(\mu\) that you would get computed on \(X\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #sample-mean

It can be shown that an unbiased estimator of an unknown \(\mathbb E \left[ X \right]\)] (given by either eq. 1 or eq. 2) is given by \(\frac{1}{N} \sum_{i=1}^N x_i\) (called in statistics the *sample mean*).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Bayes-rule #Bayes-theorem #MLBook #machine-learning

The conditional probability \(\textrm{Pr} \left( X=x \vert Y=y \right)\) is the probability of the random variable \(X\) to have a specific value \(x\) given that another random variable \(Y\) has a specific value of \(y\). The **Bayes’ Rule** (also known as the **Bayes’ Theorem**) stipulates that:

\(\textrm{Pr} \left( X=x \vert Y=y \right) = \displaystyle \frac{\textrm{Pr} \left( Y=y \vert X=x \right) \textrm{Pr} \left( X=x \right)}{\textrm{Pr} \left( Y=y \right)}\).

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #review

2.5 Parameter Estimation

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #hyperparameter #machine-learning

A hyperparameter is a property of a learning algorithm, usually (but not always) having a numerical value. That value influences the way the algorithm works. Hyperparameters aren’t learned by the algorithm itself from data. They have to be set by the data analyst before running the algorithm. I show how to do that in Chapter 5.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #parameters

Parameters are variables that define the model learned by the learning algorithm. Parameters are directly modified by the learning algorithm based on the training data. The goal of learning is to find such values of parameters that make the model optimal in a certain sense.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #classification #label #machine-learning #unlabeled-example

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #classification-learning-algorithm #labeled-examples #machine-learning #model

In machine learning, the classification problem is solved by a **classification learning algorithm** that takes a collection of **labeled examples** as inputs and produces a **model** that can take an unlabeled example as input and either directly output a label or output a number that can be used by the analyst to deduce the label. An example of such a number is a probability.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #binary-classification #binomial-classification #classes #machine-learning #multiclass-classification #multinomial-classification

In a classification problem, a label is a member of a finite set of **classes**. If the size of the set of classes is two (“sick”/“healthy”, “spam”/“not_spam”), we talk about **binary classification** (also called **binomial** in some sources). **Multiclass classification** (also called **multinomial**) is a classification problem with three or more classes.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #regression #target

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #regression-learning-algorithm

The regression problem is solved by a **regression learning algorithm** that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and output a target.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #machine-learning #model-based-learning #model-parameters

Most supervised learning algorithms are model-based. We have already seen one such algorithm: SVM. Model-based learning algorithms use the training data to create a **model** that has **parameters** learned from the training data. In SVM, the two parameters we saw were \(\mathbf w^\ast\) and \(b^\ast\) . After the model was built, the training data can be discarded.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #instance-based #k-nearest-neighbors #kNN #learning #machine-learning

Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is **k-Nearest Neighbors** (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#MLBook #deep-learning #deep-neural-networks #layer #machine-learning #neural-network #shallow-learning

A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are **neural network** learning algorithms, specifically those that build neural networks with more than one **layer** between input and output. Such neural networks are called **deep neural networks**. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Clinique #EBM #Médecine #Sémiologie

Sensitivity is the proportion of patients with the diagnosis who have the physical sign (i.e., have the positive result). Specificity is the proportion of patients without the diagnosis who lack the physical sign (i.e., have the negative result)

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

During female development, the female ovaries contain about 7 million oogonia by 24 weeks of gestation. The majority of these cells die during intrauterine life, leaving only about 1 million primary oocytes at birth. This decreases to about 400,000 by puberty

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

The surviving oogonia are arrested at the prophase of meiosis I. Completion of the first meiotic division does not occur until the time of ovulation, and the second meiosis is completed with fertilization. Only about 400 of these oocytes mature and are released by ovulation during a woman’s lifetime; the others undergo atresia at various stages of development

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

The changes that occur in the brain and hypothalamus that initiate the onset of puberty involve, first, the establishment of sleep-dependent and, later, the truly pulsatile release of gonadotropin-releasing hormone (GnRH) from the hypothalamus. The hypothalamic kisspeptin/GPR54 ligand/receptor pair appears to be the key mediator of the onset of puberty

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

Before about age 10 years in girls, gonadotropin secretion is at low levels and does not display a pulsatile character. After this age, the pulsatile release of GnRH begins and initiates folliculogenesis, leading to cyclic changes in estrogen and progesterone production. These changes allow estrogen-dependent tissues, such as the breasts and the endometrium, to begin their maturation. The appearance of breast development is referred to as thelarche, and the first menstrual period is termed menarche. CHECKPOINT 4.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

The menstrual cycle has three phases. The follicular phase typically lasts 12– 14 days and culminates in the production of a mature oocyte. Initially, a cohort of follicles begins to grow, but ultimately a single dominant follicle is selected, and the rest undergo a process of degeneration and apoptotic death, termed atresia (Figure 22–5). The follicular phase is followed by ovulation, in which the dominant follicle releases its mature oocyte to be transported through the uterine tubes for fertilization and subsequent implantation in a receptive uterus. The third, luteal, phase also averages 14 days and is characterized by luteinization of the ruptured follicle to produce the corpus luteum

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

Neurons within the hypothalamus synthesize the peptide GnRH, and its secretion is modulated by endogenous opioids and corticotropin-releasing hormone (CRH). GnRH is secreted directly into the portal circulation of the pituitary in a pulsatile fashion. This pulsatility is required for proper activation of its receptor located on the gonadotropes, which are cells located in the anterior pituitary. In response, the gonadotropes secrete the polypeptides FSH and LH, collectively called gonadotropins, which stimulate the ovary to produce estrogen and inhibin. Inhibin feeds back to suppress FSH secretion but has no effect on LH. Estrogen also affects the pituitary by increasing the number of GnRH receptors and its sensitivity to GnRH stimulation. With estradiol production by the ovaries, a critical concentration is reached for a sufficient time to induce a midcycle LH surge and subsequent ovulation. After this surge, high levels of progesterone produced by the corpus luteum suppress gonadotropin release for the duration of the luteal phase

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

Activin acts in the ovary to augment the effect of FSH, increasing aromatase activity and increasing the production of FSH and LH receptors

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

During the early follicular phase, FSH stimulates the growth of a cohort of follicles and increases the production of inhibin and activin in granulosa cells

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

LH stimulates the production of androgens in the thecal cells, which is augmented by inhibin. Androgens diffuse into the granulosa cells to be converted to estrogens through the enzymatic reaction of aromatization.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

The midcycle LH surge triggers the final steps of oocyte maturation and the resumption of meiosis within the dominant oocyte.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

Continued secretion from the corpus luteum requires LH (or human chorionic gonadotropin [hCG], as discussed below) stimulation; in its absence, degeneration occurs

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

During the follicular phase, the endometrium proliferates under the influence of estrogen, creating straight glands with thin secretions and microvascular proliferation. During the luteal phase, the high levels of estradiol and progesterone promote the maturation of the endometrium, which develops tortuous glands engorged with thick secretions and proteins (see Figure 22–2). Additionally, the endometrium secretes a number of endocrine and paracrine factors (Table 22–1). These changes optimize the environment for implantation. In the absence of pregnancy, the corpus luteum cannot sustain the high levels of progesterone production, and the endometrial vasculature cannot be maintained. This leads to a sloughing of the endometrium and the onset of menstruation, which is marked by the nadir of estradiol and progesterone levels, ending the cycle

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

#Médecine #Pathophysiology-Of-Disease #Physiologie

Most preparations of estrogen and progestin block the LH surge at midcycle, thereby preventing ovulation. However, other contraceptive actions include effects on estrogen- and progesterone-sensitive tissues, such as inducing antifertility changes in cervical mucus and the endometrial lining that are unfavorable to sperm transport and embryonic implantation, respectively.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |