BuboFlash - helps with learning

Edited, memorised or added to reading queue

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Annotation 4761932664076

The epoch is bracketed by two major events in Earth's history.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Paleocene - Wikipedia
n the modern Cenozoic Era . The name is a combination of the Ancient Greek palæo- meaning "old" and the Eocene Epoch (which succeeds the Paleocene), translating to "the old part of the Eocene". <span>The epoch is bracketed by two major events in Earth's history. The K-Pg extinction event , brought on by an asteroid impact and volcanism, marked the beginning of the Paleocene and killed off 75% of living species, most famously the non-avian dinos

Annotation 4762908101900

#MLBook

Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762911509772

#MLBook #name-origin

So why the name “machine learning” then? The reason, as is often the case, is marketing: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term in 1959 while at IBM. Similarly to how in the 2010s IBM tried to market the term “cognitive computing” to stand out from competition, in the 1960s, IBM used the new cool term “machine learning” to attract both clients and talented employees.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762913869068

#MLBook #definition #machine-learning

machine learning is a universally recognized term that usually refers to the science and engineering of building machines capable of doing various useful things without being explicitly programmed to do so.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762916228364

#MLBook #brainstorming #machine-learning

The book also comes in handy when brainstorming at the beginning of a project, when you try to answer the question whether a given technical or business problem is “machine-learnable” and, if yes, which techniques you should try to solve it.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762918849804

#MLBook #data-origin #machine-learning

Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from nature, be handcrafted by humans or generated by another algorithm.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762921209100

#MLBook #machine-learning #types

Learning can be supervised, semi-supervised, unsupervised and reinforcement.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762923568396

#MLBook #classes #dataset #feature-vector #label #labeled-examples #machine-learning #supervised-learning

In supervised learning , the dataset is the collection of labeled examples \({(\mathbf x_i , y_i)}^N_{i=1}\) . Each element \(\mathbf x_i\) i among \(N\) is called a feature vector . A feature vector is a vector in which each dimension \(j = 1 , . . . , D\) contains a value that describes the example somehow. That value is called a feature and is denoted as \(x^{(j)}\) . For instance, if each example \(\mathbf x\) in our collection represents a person, then the first feature, \(x^{(1)}\) , could contain height in cm, the second feature, \(x^{(2)}\) , could contain weight in kg, \(x^{(3)}\) could contain gender, and so on. For all examples in the dataset, the feature at position \(j\) in the feature vector always contains the same kind of information. It means that if \(x^{(2)}_i\) contains weight in kg in some example \(\mathbf x_i\) , then \(x^{(2)}_k\) will also contain weight in kg in every example \(\mathbf x_k , k = 1 , . . . , N\) . The label \(y_i\) can be either an element belonging to a finite set of classes \(\{1 , 2 , . . . , C\}\) , or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph. Unless otherwise stated, in this book \(y_i\) is either one of a finite set of classes or a real number . You can see a class as a category to which an example belongs. For instance, if your examples are email messages and your problem is spam detection, then you have two classes \(\{spam, not\_spam\}\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Flashcard 4763841596684

Question

An airway is present if the patient is conscious and speaking in a normal tone of voice.

Answer

A n ai r w ay i s p r e se n t i f th e pat i en t i s con s ci ou s a n d s peak i n g i n a n or m a l t on e of voi ce.

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Flashcard 4763842645260

Question

[default - edit me]

Answer

An airway is present

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Flashcard 4763845528844

Question

What happens when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles?

Answer

We shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

Parent (intermediate) annotation

Open it
when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles, we shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4763854966028

#MLBook #goal #model #supervised-learning

The goal of a supervised learning algorithm is to use the dataset to produce a model that takes a feature vector \(\mathbf x\) as input and outputs information that allows deducing the label for this feature vector. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763858898188

#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning

In unsupervised learning, the dataset is a collection of unlabeled examples \(\{\mathbf x_i\}^N_{i=1}\). Again, \(\mathbf x\) is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector \(\mathbf x\) as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input \(\mathbf x\); in outlier detection, the output is a real number that indicates how \(\mathbf x\) is different from a “typical” example in the dataset.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763862043916

#MLBook #machine-learning #semi-supervised-learning

In semi-supervised learning, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a semi-supervised learning algorithm is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model.

It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763864403212

#MLBook #actions #expected-average-reward #policy #reinforcement-learning #rewards #state

Reinforcement learning is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the state of that environment as a vector of features. The machine can execute actions in every state. Different actions bring different rewards and could also move the machine to another state of the environment. The goal of a reinforcement learning algorithm is to learn a policy.

A policy is a function (similar to the model in supervised learning) that takes the feature vector of a state as input and outputs an optimal action to execute in that state. The action is optimal if it maximizes the expected average reward.

Reinforcement learning solves a particular kind of problem where decision making is sequential, and the goal is long-term, such as game playing, robotics, resource management, or logistics.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763865976076

#MLBook #inputs #machine-learning #outputs #supervised-learning

The supervised learning process starts with gathering the data. The data for supervised learning is a collection of pairs (input, output). Input could be anything, for example, email messages, pictures, or sensor measurements. Outputs are usually real numbers, or labels (e.g. “spam”, “not_spam”, “cat”, “dog”, “mouse”, etc). In some cases, outputs are vectors (e.g., four coordinates of the rectangle around a person on the picture), sequences (e.g. [“adjective”, “adjective”, “noun”] for the input “big beautiful car”), or have some other structure.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763869383948

#MLBook #decision-boundary

In machine learning, the boundary separating the examples of different classes is called the decision boundary.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763878558988

#distance #line #point

In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x₀,y₀) is^[1]^[2]^{: p.14}

\(\operatorname{distance}(ax+by+c=0, (x_0, y_0)) = \frac{|ax_0+by_0+c|}{\sqrt{a^2+b^2}}. \)

The point on this line which is closest to (x₀,y₀) has coordinates:^[3]

\(x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.\)

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Distance from a point to a line - Wikipedia
ction proof 3 Another formula 4 Vector formulation 5 Another vector formulation 6 See also 7 Notes 8 References 9 Further reading Cartesian coordinates[edit ] Line defined by an equation[edit ] <span>In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]:p.14 distance ⁡ ( a x + b y + c = 0 , ( x 0 , y 0 ) ) = | a x 0 + b y 0 + c | a 2 + b 2 . {\displaystyle \operatorname {distance} (ax+by+c=0,(x_{0},y_{0}))={\frac {|ax_{0}+by_{0}+c|}{\sqrt {a^{2}+b^{2}}}}.} The point on this line which is closest to (x0,y0) has coordinates:[3] x = b ( b x 0 − a y 0 ) − a c a 2 + b 2 and y = a ( − b x 0 + a y 0 ) − b c a 2 + b 2 . {\displaystyle x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.} Horizontal and vertical lines In the general equation of a line, ax + by + c = 0, a and b cannot both be zero unless c is also zero, in which case the equation does not define a line. I

Annotation 4763882229004

#distance #straight-lines

Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines

\(y=mx+b_{1}\,\) \(y=mx+b_{2}\,,\)

the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line

\({\displaystyle y=-x/m\,.}\)

This distance can be found by first solving the linear systems

\({\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}\)

and

\({\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}\)

to get the coordinates of the intersection points. The solutions to the linear systems are the points

\(\left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,\)

and

\(\left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.\)

The distance between the points is

\(d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,\)

which reduces to

\(d={\frac {|b_{2}-b_{1}|}{{\sqrt {m^{2}+1}}}}\,.\)

When the lines are given by

\(ax+by+c_{1}=0\,\) \(ax+by+c_{2}=0,\,\)

the distance between them can be expressed as

\(d={\frac {|c_{2}-c_{1}|}{{\sqrt {a^{2}+b^{2}}}}}.\)

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Distance between two straight lines - Wikipedia
el lines, the distance is the perpendicular distance from any point on one line to the other line. Contents 1 Formula and proof 2 See also 3 References 4 External links Formula and proof[edit ] <span>Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines y = m x + b 1 {\displaystyle y=mx+b_{1}\,} y = m x + b 2 , {\displaystyle y=mx+b_{2}\,,} the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line y = − x / m . {\displaystyle y=-x/m\,.} This distance can be found by first solving the linear systems { y = m x + b 1 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}} and { y = m x + b 2 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}} to get the coordinates of the intersection points. The solutions to the linear systems are the points ( x 1 , y 1 ) = ( − b 1 m m 2 + 1 , b 1 m 2 + 1 ) , {\displaystyle \left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,} and ( x 2 , y 2 ) = ( − b 2 m m 2 + 1 , b 2 m 2 + 1 ) . {\displaystyle \left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.} The distance between the points is d = ( b 1 m − b 2 m m 2 + 1 ) 2 + ( b 2 − b 1 m 2 + 1 ) 2 , {\displaystyle d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,} which reduces to d = | b 2 − b 1 | m 2 + 1 . {\displaystyle d={\frac {|b_{2}-b_{1}|}{\sqrt {m^{2}+1}}}\,.} When the lines are given by a x + b y + c 1 = 0 {\displaystyle ax+by+c_{1}=0\,} a x + b y + c 2 = 0 , {\displaystyle ax+by+c_{2}=0,\,} the distance between them can be expressed as d = | c 2 − c 1 | a 2 + b 2 . {\displaystyle d={\frac {|c_{2}-c_{1}|}{\sqrt {a^{2}+b^{2}}}}.} See also[edit ] Distance from a point to a line Skew lines#Distance References[edit ] Abstand In: Schülerduden – Mathematik II. Bibliographisches Institut & F. A. Brockhaus, 2004, I

Annotation 4763885899020

#MLBook #accuracy #classification-learning-algorithm #decision-boundary

Any classification learning algorithm that builds a model implicitly or explicitly creates a decision boundary. The decision boundary can be straight, or curved, or it can have a complex form, or it can be a superposition of some geometrical figures. The form of the decision boundary determines the accuracy of the model (that is the ratio of examples whose labels are predicted correctly). The form of the decision boundary, the way it is algorithmically or mathematically computed based on the training data, differentiates one learning algorithm from another.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763889831180

#MLBook #learning-algorithms #machine-learning #prediction-processing-time #speed-of-model-building

In practice, there are two other essential differentiators of learning algorithms to consider: speed of model building and prediction processing time. In many practical cases, you would prefer a learning algorithm that builds a less accurate model fast. Additionally, you might prefer a less accurate model that is much quicker at making predictions.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763892452620

[unknown IMAGE 4763872791820]

#MLBook #error #has-images #machine-learning #prediction #probability

Why is a machine-learned model capable of predicting correctly the labels of new, previously unseen examples? To understand that, look at the plot in Figure 1. If two classes are separable from one another by a decision boundary, then, obviously, examples that belong to each class are located in two different subspaces which the decision boundary creates.

If the examples used for training were selected randomly, independently of one another, and following the same procedure, then, statistically, it is more likely that the new negative example will be located on the plot somewhere not too far from other negative examples. The same concerns the new positive example: it will likely come from the surroundings of other positive examples. In such a case, our decision boundary will still, with high probability, separate well new positive and negative examples from one another. For other, less likely situations, our model will make errors, but because such situations are less likely, the number of errors will likely be smaller than the number of correct predictions.

Intuitively, the larger is the set of training examples, the more unlikely that the new examples will be dissimilar to (and lie on the plot far from) the examples used for training.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763896909068

#MLBook #machine-learning #set

A set is an unordered collection of unique elements.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763899268364

#MLBook #cardinality-operator #machine-learning

The cardinality operator \(\left\vert \mathcal S \right\vert\) returns the number of elements in set \(\mathcal S\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763901627660

#MLBook #codomain #domain #function #machine-learning

A function is a relation that associates each element \(x\) of a set \(\mathcal X\) , the domain of the function, to a single element \(y\) of another set \(\mathcal Y\) , the codomain of the function.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763903986956

[unknown IMAGE 4763906608396]

#MLBook #has-images #local-minimum #machine-learning

We say that \(f(x)\) has a local minimum at \(x = c\) if \(f(x) \ge f(c)\) for every \(x\) in some open interval around \(x = c\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763910016268

#MLBook #machine-learning #vector-function

A vector function, denoted as \(\mathbf y = \mathbf f(x)\) is a function that returns a vector \(\mathbf y\) . It can have a vector or a scalar argument.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763912375564

#MLBook #arg-max #machine-learning #max

Given a set of values \(\mathcal A = \{a_1, a_2, \ldots , a_n \}\), the operator \(\max_{a \in A} f(a)\) returns the highest value \(f(a)\) for all elements in the set \(\mathcal A\) . On the other hand, the operator \(\arg \max_{a \in A} f(a)\) returns the element of the set \(\mathcal A\) that maximizes \(f(a)\). Sometimes, when the set is implicit or infinite, we can write \(\max_a f(a)\) or \(\arg \max_a f(a)\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763914734860

#MLBook #gradient #machine-learning #partial-derivatives

Gradient is the generalization of derivative for functions that take several inputs (or one input in the form of a vector or some other complex structure). A gradient of a function is a vector of partial derivatives. You can look at finding a partial derivative of a function as the process of finding the derivative by focusing on one of the function’s inputs and by considering all other inputs as constant values.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763917094156

#MLBook #gradient #machine-learning

The gradient of function\(f \left( \left[ x^{(1)}, x^{(2)} \right] \right)\), denoted as \(\nabla f\), is given by the vector \(\left[ \frac{\partial f}{\partial x^{(1)}}, \frac{\partial f}{\partial x^{(2)}} \right]\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763920502028

#MLBook #machine-learning #random-variable

A random variable, usually written as an italic capital letter, like \(X\) , is a variable whose possible values are numerical outcomes of a random phenomenon.

Remark: In the following, the author gives an example in which red, yellow, and blue are possible values. So, the outcomes are not necessarily numbers.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763926007052

[unknown IMAGE 4763923909900]

#MLBook #has-images #machine-learning #pmf #probability-distribution #probability-mass-function

The probability distribution of a discrete random variable is described by a list of probabilities associated with each of its possible values. This list of probabilities is called a probability mass function (pmf). For example: \(\operatorname{Pr}(X=red) = 0.3\), \(\operatorname{Pr}(X=yellow) = 0.45\), \(\operatorname{Pr}(X=blue) = 0.25\). Each probability in a probability mass function is a value greater than or equal to 0. The sum of probabilities equals 1 (Figure 3a).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763929677068

[unknown IMAGE 4763923909900]

#MLBook #has-images #machine-learning #pdf #probability-density-function

Because the number of values of a continuous random variable \(X\) is infinite, the probability \(\operatorname{Pr}(X=c)\) for any \(c\) is 0. Therefore, instead of the list of probabilities, the probability distribution of a CRV (a continuous probability distribution) is described by a probability density function (pdf). The pdf is a function whose codomain is nonnegative and the area under the curve is equal to 1 (Figure 3b).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763933609228

#MLBook #expectation #expected-value #machine-learning #statistics

Let a discrete random variable \(X\) have \(k\) possible values \(\{ x_i \}_{i=1}^k\). The expectation of \(X\) denoted as \(\mathbb E[X]\) is given by,

\(\begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align}\)

where \(\textrm{Pr} \left( X = x_i \right)\) is the probability that \(X\) has the value \(x_i\) according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter \(\mu\) . The expectation is one of the most important statistics of a random variable.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763936230668

#MLBook #machine-learning #standard-deviation #variance

Another important statistic is the standard deviation, defined as,

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\mathbb E \left[ \left( X - \mu\right)^2 \right] }.\)

Variance, denoted as \(\sigma^2\) or \(var(X)\), is defined as,

\(\sigma^2 \stackrel{\textrm{def}}{=} \mathbb E \left[ \left( X - \mu\right)^2 \right].\)

For a discrete random variable, the standard deviation is given by:

\(\sigma \stackrel{\textrm{def}}{=} \sqrt{\textrm{Pr} \left( X = x_1 \right) \left( x_1 - \mu \right)^2 + \textrm{Pr} \left( X = x_2 \right) \left( x_2 - \mu \right)^2 + \cdots + \textrm{Pr} \left( X = x_k \right) \left( x_k - \mu \right)^2},\)

where \(\mu = \mathbb E \left[ X \right]\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763940687116

#MLBook #continuous-random-variable #expectation #machine-learning

The expectation of a continuous random variable \(X\) is given by,

\(\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,\)

where \(f_X\) is the pdf of the variable \(X\) and \(\int_{\mathbb R}\) is the integral of function \(x f_X\) .

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763943046412

#MLBook #dataset #examples #machine-learning #sample

Most of the time we don’t know \(f_X\) , but we can observe some values of \(X\). In machine learning, we call these values examples, and the collection of these examples is called a sample or a dataset.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4763946716428

#MLBook #machine-learning #sample-statistic #unbiased-estimators

Because \(f_X\) is usually unknown, but we have a sample \(S_X = \{ x_i \}_{i=1}^N\) , we often content ourselves not with the true values of statistics of the probability distribution, such as expectation, but with their unbiased estimators.

We say that \(\hat{\theta} \left( S_X \right)\) is an unbiased estimator of some statistic \(\theta\) calculated using a sample \(S_X\) drawn from an unknown probability distribution if \(\hat{\theta} \left( S_X \right)\) has the following property:

\(\mathbb E \left[ \hat{\theta} \left( S_X \right) \right] = \theta,\)

where \(\hat{\theta}\) is a sample statistic, obtained using a sample \(S_X\) and not the real statistic \(\theta\) that can be obtained only knowing \(X\); the expectation is taken over all possible samples drawn from \(X\) . Intuitively, this means that if you can have an unlimited number of such samples as \(S_X\), and you compute some unbiased estimator, such as \(\hat{\mu}\) , using each sample, then the average of all these \(\hat{\mu}\) equals the real statistic \(\mu\) that you would get computed on \(X\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764500626700

#MLBook #machine-learning #sample-mean

It can be shown that an unbiased estimator of an unknown \(\mathbb E \left[ X \right]\)] (given by either eq. 1 or eq. 2) is given by \(\frac{1}{N} \sum_{i=1}^N x_i\) (called in statistics the sample mean).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764502985996

#Bayes-rule #Bayes-theorem #MLBook #machine-learning

The conditional probability \(\textrm{Pr} \left( X=x \vert Y=y \right)\) is the probability of the random variable \(X\) to have a specific value \(x\) given that another random variable \(Y\) has a specific value of \(y\). The Bayes’ Rule (also known as the Bayes’ Theorem) stipulates that:

\(\textrm{Pr} \left( X=x \vert Y=y \right) = \displaystyle \frac{\textrm{Pr} \left( Y=y \vert X=x \right) \textrm{Pr} \left( X=x \right)}{\textrm{Pr} \left( Y=y \right)}\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764513733900

#MLBook #machine-learning #review

2.5 Parameter Estimation

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764516093196

#MLBook #hyperparameter #machine-learning

A hyperparameter is a property of a learning algorithm, usually (but not always) having a numerical value. That value influences the way the algorithm works. Hyperparameters aren’t learned by the algorithm itself from data. They have to be set by the data analyst before running the algorithm. I show how to do that in Chapter 5.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764518452492

#MLBook #machine-learning #parameters

Parameters are variables that define the model learned by the learning algorithm. Parameters are directly modified by the learning algorithm based on the training data. The goal of learning is to find such values of parameters that make the model optimal in a certain sense.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764520811788

#MLBook #classification #label #machine-learning #unlabeled-example

Classification is a problem of automatically assigning a label to an unlabeled example. Spam detection is a famous example of classification.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764523171084

#MLBook #classification-learning-algorithm #labeled-examples #machine-learning #model

In machine learning, the classification problem is solved by a classification learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and either directly output a label or output a number that can be used by the analyst to deduce the label. An example of such a number is a probability.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764525530380

#MLBook #binary-classification #binomial-classification #classes #machine-learning #multiclass-classification #multinomial-classification

In a classification problem, a label is a member of a finite set of classes. If the size of the set of classes is two (“sick”/“healthy”, “spam”/“not_spam”), we talk about binary classification (also called binomial in some sources). Multiclass classification (also called multinomial) is a classification problem with three or more classes.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764528413964

#MLBook #machine-learning #regression #target

Regression is a problem of predicting a real-valued label (often called a target) given an unlabeled example. Estimating house price valuation based on house features, such as area, the number of bedrooms, location and so on is a famous example of regression.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764530773260

#MLBook #machine-learning #regression-learning-algorithm

The regression problem is solved by a regression learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and output a target.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764533132556

#MLBook #machine-learning #model-based-learning #model-parameters

Most supervised learning algorithms are model-based. We have already seen one such algorithm: SVM. Model-based learning algorithms use the training data to create a model that has parameters learned from the training data. In SVM, the two parameters we saw were \(\mathbf w^\ast\) and \(b^\ast\) . After the model was built, the training data can be discarded.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764536016140

#MLBook #instance-based #k-nearest-neighbors #kNN #learning #machine-learning

Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is k-Nearest Neighbors (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764539424012

#MLBook #deep-learning #deep-neural-networks #layer #machine-learning #neural-network #shallow-learning

A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4764615707916

#Clinique #EBM #Médecine #Sémiologie

Sensitivity is the proportion of patients with the diagnosis who have the physical sign (i.e., have the positive result). Specificity is the proportion of patients without the diagnosis who lack the physical sign (i.e., have the negative result)

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4768344444172

The two ovaries contain thousands of follicles, each with an oocyte surrounded by a layer of granulosa cells and thecal cells. These supporting cells produce steroids and paracrine products important in follicular maturation and the coordination of events in reproduction

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4768349424908

Until week 8 of gestation, the sex of the embryo cannot be determined morphologically; therefore, this period is termed the indifferent phase of sexual development. After this time, differentiation of the internal and external genitalia occurs, determining the phenotypic sex of the individual, which becomes fully developed after puberty

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4768350473484

After 8 weeks of gestation, the production of anti-müllerian hormone by Sertoli cells in the fetal testes leads to regression of the müllerian ducts, whereas production of testosterone by the Leydig cells leads to the persistence of the wolffian duct and the subsequent development of the prostate, epididymis, and seminal vesicles. In the absence of these secretions, female internal reproductive organs are formed from the müllerian ducts, and the wolffian structures degenerate.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on