# on 08-Jan-2020 (Wed)

#### Annotation 4761932664076

 The epoch is bracketed by two major events in Earth's history.

Paleocene - Wikipedia
n the modern Cenozoic Era . The name is a combination of the Ancient Greek palæo- meaning "old" and the Eocene Epoch (which succeeds the Paleocene), translating to "the old part of the Eocene". <span>The epoch is bracketed by two major events in Earth's history. The K-Pg extinction event , brought on by an asteroid impact and volcanism, marked the beginning of the Paleocene and killed off 75% of living species, most famously the non-avian dinos

#### Annotation 4762908101900

 #MLBook Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

#### pdf

cannot see any pdfs

#### Annotation 4762911509772

 #MLBook #name-origin So why the name “machine learning” then? The reason, as is often the case, is marketing: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term in 1959 while at IBM. Similarly to how in the 2010s IBM tried to market the term “cognitive computing” to stand out from competition, in the 1960s, IBM used the new cool term “machine learning” to attract both clients and talented employees.

#### pdf

cannot see any pdfs

#### Annotation 4762913869068

 #MLBook #definition #machine-learning machine learning is a universally recognized term that usually refers to the science and engineering of building machines capable of doing various useful things without being explicitly programmed to do so.

#### pdf

cannot see any pdfs

#### Annotation 4762916228364

 #MLBook #brainstorming #machine-learning The book also comes in handy when brainstorming at the beginning of a project, when you try to answer the question whether a given technical or business problem is “machine-learnable” and, if yes, which techniques you should try to solve it.

#### pdf

cannot see any pdfs

#### Annotation 4762918849804

 #MLBook #data-origin #machine-learning Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from nature, be handcrafted by humans or generated by another algorithm.

#### pdf

cannot see any pdfs

#### Annotation 4762921209100

 #MLBook #machine-learning #types Learning can be supervised, semi-supervised, unsupervised and reinforcement.

#### pdf

cannot see any pdfs

#### Annotation 4762923568396

 #MLBook #classes #dataset #feature-vector #label #labeled-examples #machine-learning #supervised-learning In supervised learning , the dataset is the collection of labeled examples $${(\mathbf x_i , y_i)}^N_{i=1}$$ . Each element $$\mathbf x_i$$ i among $$N$$ is called a feature vector . A feature vector is a vector in which each dimension $$j = 1 , . . . , D$$ contains a value that describes the example somehow. That value is called a feature and is denoted as $$x^{(j)}$$ . For instance, if each example $$\mathbf x$$ in our collection represents a person, then the first feature, $$x^{(1)}$$ , could contain height in cm, the second feature, $$x^{(2)}$$ , could contain weight in kg, $$x^{(3)}$$ could contain gender, and so on. For all examples in the dataset, the feature at position $$j$$ in the feature vector always contains the same kind of information. It means that if $$x^{(2)}_i$$ contains weight in kg in some example $$\mathbf x_i$$ , then $$x^{(2)}_k$$ will also contain weight in kg in every example $$\mathbf x_k , k = 1 , . . . , N$$ . The label $$y_i$$ can be either an element belonging to a finite set of classes $$\{1 , 2 , . . . , C\}$$ , or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph. Unless otherwise stated, in this book $$y_i$$ is either one of a finite set of classes or a real number . You can see a class as a category to which an example belongs. For instance, if your examples are email messages and your problem is spam detection, then you have two classes $$\{spam, not\_spam\}$$.

#### pdf

cannot see any pdfs

#### Flashcard 4763841596684

Question
An airway is present if the patient is conscious and speaking in a normal tone of voice.
A n ai r w ay i s p r e se n t i f th e pat i en t i s con s ci ou s a n d s peak i n g i n a n or m a l t on e of voi ce.

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 4763842645260

Question
[default - edit me]
An airway is present

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 4763845528844

Question
What happens when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles?
We shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles, we shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 4763854966028

 #MLBook #goal #model #supervised-learning The goal of a supervised learning algorithm is to use the dataset to produce a model that takes a feature vector $$\mathbf x$$ as input and outputs information that allows deducing the label for this feature vector. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.

#### pdf

cannot see any pdfs

#### Annotation 4763858898188

 #MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning In unsupervised learning, the dataset is a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input $$\mathbf x$$; in outlier detection, the output is a real number that indicates how $$\mathbf x$$ is different from a “typical” example in the dataset.

#### pdf

cannot see any pdfs

#### Annotation 4763862043916

 #MLBook #machine-learning #semi-supervised-learning In semi-supervised learning, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a semi-supervised learning algorithm is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model. It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

#### pdf

cannot see any pdfs

#### Annotation 4763864403212

 #MLBook #actions #expected-average-reward #policy #reinforcement-learning #rewards #state Reinforcement learning is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the state of that environment as a vector of features. The machine can execute actions in every state. Different actions bring different rewards and could also move the machine to another state of the environment. The goal of a reinforcement learning algorithm is to learn a policy. A policy is a function (similar to the model in supervised learning) that takes the feature vector of a state as input and outputs an optimal action to execute in that state. The action is optimal if it maximizes the expected average reward. Reinforcement learning solves a particular kind of problem where decision making is sequential, and the goal is long-term, such as game playing, robotics, resource management, or logistics.

#### pdf

cannot see any pdfs

#### Annotation 4763865976076

 #MLBook #inputs #machine-learning #outputs #supervised-learning The supervised learning process starts with gathering the data. The data for supervised learning is a collection of pairs (input, output). Input could be anything, for example, email messages, pictures, or sensor measurements. Outputs are usually real numbers, or labels (e.g. “spam”, “not_spam”, “cat”, “dog”, “mouse”, etc). In some cases, outputs are vectors (e.g., four coordinates of the rectangle around a person on the picture), sequences (e.g. [“adjective”, “adjective”, “noun”] for the input “big beautiful car”), or have some other structure.

#### pdf

cannot see any pdfs

#### Annotation 4763869383948

 #MLBook #decision-boundary In machine learning, the boundary separating the examples of different classes is called the decision boundary.

#### pdf

cannot see any pdfs

#### Annotation 4763878558988

 #distance #line #point In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]: p.14 $$\operatorname{distance}(ax+by+c=0, (x_0, y_0)) = \frac{|ax_0+by_0+c|}{\sqrt{a^2+b^2}}.$$ The point on this line which is closest to (x0,y0) has coordinates:[3] $$x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.$$

Distance from a point to a line - Wikipedia
ction proof 3 Another formula 4 Vector formulation 5 Another vector formulation 6 See also 7 Notes 8 References 9 Further reading Cartesian coordinates[edit ] Line defined by an equation[edit ] <span>In the case of a line in the plane given by the equation ax + by + c = 0, where a, b and c are real constants with a and b not both zero, the distance from the line to a point (x0,y0) is[1][2]:p.14 distance ⁡ ( a x + b y + c = 0 , ( x 0 , y 0 ) ) = | a x 0 + b y 0 + c | a 2 + b 2 . {\displaystyle \operatorname {distance} (ax+by+c=0,(x_{0},y_{0}))={\frac {|ax_{0}+by_{0}+c|}{\sqrt {a^{2}+b^{2}}}}.} The point on this line which is closest to (x0,y0) has coordinates:[3] x = b ( b x 0 − a y 0 ) − a c a 2 + b 2 and y = a ( − b x 0 + a y 0 ) − b c a 2 + b 2 . {\displaystyle x={\frac {b(bx_{0}-ay_{0})-ac}{a^{2}+b^{2}}}{\text{ and }}y={\frac {a(-bx_{0}+ay_{0})-bc}{a^{2}+b^{2}}}.} Horizontal and vertical lines In the general equation of a line, ax + by + c = 0, a and b cannot both be zero unless c is also zero, in which case the equation does not define a line. I

#### Annotation 4763882229004

 #distance #straight-lines Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines $$y=mx+b_{1}\,$$ $$y=mx+b_{2}\,,$$ the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line $${\displaystyle y=-x/m\,.}$$ This distance can be found by first solving the linear systems $${\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}$$ and $${\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}$$ to get the coordinates of the intersection points. The solutions to the linear systems are the points $$\left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,$$ and $$\left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.$$ The distance between the points is $$d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,$$ which reduces to $$d={\frac {|b_{2}-b_{1}|}{{\sqrt {m^{2}+1}}}}\,.$$ When the lines are given by $$ax+by+c_{1}=0\,$$ $$ax+by+c_{2}=0,\,$$ the distance between them can be expressed as $$d={\frac {|c_{2}-c_{1}|}{{\sqrt {a^{2}+b^{2}}}}}.$$

Distance between two straight lines - Wikipedia
el lines, the distance is the perpendicular distance from any point on one line to the other line. Contents 1 Formula and proof 2 See also 3 References 4 External links Formula and proof[edit ] <span>Because the lines are parallel, the perpendicular distance between them is a constant, so it does not matter which point is chosen to measure the distance. Given the equations of two non-vertical parallel lines y = m x + b 1 {\displaystyle y=mx+b_{1}\,} y = m x + b 2 , {\displaystyle y=mx+b_{2}\,,} the distance between the two lines is the distance between the two intersection points of these lines with the perpendicular line y = − x / m . {\displaystyle y=-x/m\,.} This distance can be found by first solving the linear systems { y = m x + b 1 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{1}\\y=-x/m\,,\end{cases}}} and { y = m x + b 2 y = − x / m , {\displaystyle {\begin{cases}y=mx+b_{2}\\y=-x/m\,,\end{cases}}} to get the coordinates of the intersection points. The solutions to the linear systems are the points ( x 1 , y 1 ) = ( − b 1 m m 2 + 1 , b 1 m 2 + 1 ) , {\displaystyle \left(x_{1},y_{1}\right)\ =\left({\frac {-b_{1}m}{m^{2}+1}},{\frac {b_{1}}{m^{2}+1}}\right)\,,} and ( x 2 , y 2 ) = ( − b 2 m m 2 + 1 , b 2 m 2 + 1 ) . {\displaystyle \left(x_{2},y_{2}\right)\ =\left({\frac {-b_{2}m}{m^{2}+1}},{\frac {b_{2}}{m^{2}+1}}\right)\,.} The distance between the points is d = ( b 1 m − b 2 m m 2 + 1 ) 2 + ( b 2 − b 1 m 2 + 1 ) 2 , {\displaystyle d={\sqrt {\left({\frac {b_{1}m-b_{2}m}{m^{2}+1}}\right)^{2}+\left({\frac {b_{2}-b_{1}}{m^{2}+1}}\right)^{2}}}\,,} which reduces to d = | b 2 − b 1 | m 2 + 1 . {\displaystyle d={\frac {|b_{2}-b_{1}|}{\sqrt {m^{2}+1}}}\,.} When the lines are given by a x + b y + c 1 = 0 {\displaystyle ax+by+c_{1}=0\,} a x + b y + c 2 = 0 , {\displaystyle ax+by+c_{2}=0,\,} the distance between them can be expressed as d = | c 2 − c 1 | a 2 + b 2 . {\displaystyle d={\frac {|c_{2}-c_{1}|}{\sqrt {a^{2}+b^{2}}}}.} See also[edit ] Distance from a point to a line Skew lines#Distance References[edit ] Abstand In: Schülerduden – Mathematik II. Bibliographisches Institut & F. A. Brockhaus, 2004, I

#### Annotation 4763885899020

 #MLBook #accuracy #classification-learning-algorithm #decision-boundary Any classification learning algorithm that builds a model implicitly or explicitly creates a decision boundary. The decision boundary can be straight, or curved, or it can have a complex form, or it can be a superposition of some geometrical figures. The form of the decision boundary determines the accuracy of the model (that is the ratio of examples whose labels are predicted correctly). The form of the decision boundary, the way it is algorithmically or mathematically computed based on the training data, differentiates one learning algorithm from another.

#### pdf

cannot see any pdfs

#### Annotation 4763889831180

 #MLBook #learning-algorithms #machine-learning #prediction-processing-time #speed-of-model-building In practice, there are two other essential differentiators of learning algorithms to consider: speed of model building and prediction processing time. In many practical cases, you would prefer a learning algorithm that builds a less accurate model fast. Additionally, you might prefer a less accurate model that is much quicker at making predictions.

#### pdf

cannot see any pdfs

#### Annotation 4763892452620

 [unknown IMAGE 4763872791820] #MLBook #error #has-images #machine-learning #prediction #probability Why is a machine-learned model capable of predicting correctly the labels of new, previously unseen examples? To understand that, look at the plot in Figure 1. If two classes are separable from one another by a decision boundary, then, obviously, examples that belong to each class are located in two different subspaces which the decision boundary creates. If the examples used for training were selected randomly, independently of one another, and following the same procedure, then, statistically, it is more likely that the new negative example will be located on the plot somewhere not too far from other negative examples. The same concerns the new positive example: it will likely come from the surroundings of other positive examples. In such a case, our decision boundary will still, with high probability, separate well new positive and negative examples from one another. For other, less likely situations, our model will make errors, but because such situations are less likely, the number of errors will likely be smaller than the number of correct predictions. Intuitively, the larger is the set of training examples, the more unlikely that the new examples will be dissimilar to (and lie on the plot far from) the examples used for training.

#### pdf

cannot see any pdfs

#### Annotation 4763896909068

 #MLBook #machine-learning #set A set is an unordered collection of unique elements.

#### pdf

cannot see any pdfs

#### Annotation 4763899268364

 #MLBook #cardinality-operator #machine-learning The cardinality operator $$\left\vert \mathcal S \right\vert$$ returns the number of elements in set $$\mathcal S$$.

#### pdf

cannot see any pdfs

#### Annotation 4763901627660

 #MLBook #codomain #domain #function #machine-learning A function is a relation that associates each element $$x$$ of a set $$\mathcal X$$ , the domain of the function, to a single element $$y$$ of another set $$\mathcal Y$$ , the codomain of the function.

#### pdf

cannot see any pdfs

#### Annotation 4763903986956

 [unknown IMAGE 4763906608396] #MLBook #has-images #local-minimum #machine-learning We say that $$f(x)$$ has a local minimum at $$x = c$$ if $$f(x) \ge f(c)$$ for every $$x$$ in some open interval around $$x = c$$.

#### pdf

cannot see any pdfs

#### Annotation 4763910016268

 #MLBook #machine-learning #vector-function A vector function, denoted as $$\mathbf y = \mathbf f(x)$$ is a function that returns a vector $$\mathbf y$$ . It can have a vector or a scalar argument.

#### pdf

cannot see any pdfs

#### Annotation 4763912375564

 #MLBook #arg-max #machine-learning #max Given a set of values $$\mathcal A = \{a_1, a_2, \ldots , a_n \}$$, the operator $$\max_{a \in A} f(a)$$ returns the highest value $$f(a)$$ for all elements in the set $$\mathcal A$$ . On the other hand, the operator $$\arg \max_{a \in A} f(a)$$ returns the element of the set $$\mathcal A$$ that maximizes $$f(a)$$. Sometimes, when the set is implicit or infinite, we can write $$\max_a f(a)$$ or $$\arg \max_a f(a)$$.

#### pdf

cannot see any pdfs

#### Annotation 4763914734860

 #MLBook #gradient #machine-learning #partial-derivatives Gradient is the generalization of derivative for functions that take several inputs (or one input in the form of a vector or some other complex structure). A gradient of a function is a vector of partial derivatives. You can look at finding a partial derivative of a function as the process of finding the derivative by focusing on one of the function’s inputs and by considering all other inputs as constant values.

#### pdf

cannot see any pdfs

#### Annotation 4763917094156

 #MLBook #gradient #machine-learning The gradient of function$$f \left( \left[ x^{(1)}, x^{(2)} \right] \right)$$, denoted as $$\nabla f$$, is given by the vector $$\left[ \frac{\partial f}{\partial x^{(1)}}, \frac{\partial f}{\partial x^{(2)}} \right]$$.

#### pdf

cannot see any pdfs

#### Annotation 4763920502028

 #MLBook #machine-learning #random-variable A random variable, usually written as an italic capital letter, like $$X$$ , is a variable whose possible values are numerical outcomes of a random phenomenon. Remark: In the following, the author gives an example in which red, yellow, and blue are possible values. So, the outcomes are not necessarily numbers.

#### pdf

cannot see any pdfs

#### Annotation 4763926007052

 [unknown IMAGE 4763923909900] #MLBook #has-images #machine-learning #pmf #probability-distribution #probability-mass-function The probability distribution of a discrete random variable is described by a list of probabilities associated with each of its possible values. This list of probabilities is called a probability mass function (pmf). For example: $$\operatorname{Pr}(X=red) = 0.3$$, $$\operatorname{Pr}(X=yellow) = 0.45$$, $$\operatorname{Pr}(X=blue) = 0.25$$. Each probability in a probability mass function is a value greater than or equal to 0. The sum of probabilities equals 1 (Figure 3a).

#### pdf

cannot see any pdfs

#### Annotation 4763929677068

 [unknown IMAGE 4763923909900] #MLBook #has-images #machine-learning #pdf #probability-density-function Because the number of values of a continuous random variable $$X$$ is infinite, the probability $$\operatorname{Pr}(X=c)$$ for any $$c$$ is 0. Therefore, instead of the list of probabilities, the probability distribution of a CRV (a continuous probability distribution) is described by a probability density function (pdf). The pdf is a function whose codomain is nonnegative and the area under the curve is equal to 1 (Figure 3b).

#### pdf

cannot see any pdfs

#### Annotation 4763933609228

 #MLBook #expectation #expected-value #machine-learning #statistics Let a discrete random variable $$X$$ have $$k$$ possible values $$\{ x_i \}_{i=1}^k$$. The expectation of $$X$$ denoted as $$\mathbb E[X]$$ is given by, \begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align} where $$\textrm{Pr} \left( X = x_i \right)$$ is the probability that $$X$$ has the value $$x_i$$ according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter $$\mu$$ . The expectation is one of the most important statistics of a random variable.

#### pdf

cannot see any pdfs

#### Annotation 4763936230668

 #MLBook #machine-learning #standard-deviation #variance Another important statistic is the standard deviation, defined as, $$\sigma \stackrel{\textrm{def}}{=} \sqrt{\mathbb E \left[ \left( X - \mu\right)^2 \right] }.$$ Variance, denoted as $$\sigma^2$$ or $$var(X)$$, is defined as, $$\sigma^2 \stackrel{\textrm{def}}{=} \mathbb E \left[ \left( X - \mu\right)^2 \right].$$ For a discrete random variable, the standard deviation is given by: $$\sigma \stackrel{\textrm{def}}{=} \sqrt{\textrm{Pr} \left( X = x_1 \right) \left( x_1 - \mu \right)^2 + \textrm{Pr} \left( X = x_2 \right) \left( x_2 - \mu \right)^2 + \cdots + \textrm{Pr} \left( X = x_k \right) \left( x_k - \mu \right)^2},$$ where $$\mu = \mathbb E \left[ X \right]$$.

#### pdf

cannot see any pdfs

#### Annotation 4763940687116

 #MLBook #continuous-random-variable #expectation #machine-learning The expectation of a continuous random variable $$X$$ is given by, $$\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,$$ where $$f_X$$ is the pdf of the variable $$X$$ and $$\int_{\mathbb R}$$ is the integral of function $$x f_X$$ .

#### pdf

cannot see any pdfs

#### Annotation 4763943046412

 #MLBook #dataset #examples #machine-learning #sample Most of the time we don’t know $$f_X$$ , but we can observe some values of $$X$$. In machine learning, we call these values examples, and the collection of these examples is called a sample or a dataset.

#### pdf

cannot see any pdfs

#### Annotation 4763946716428

 #MLBook #machine-learning #sample-statistic #unbiased-estimators Because $$f_X$$ is usually unknown, but we have a sample $$S_X = \{ x_i \}_{i=1}^N$$ , we often content ourselves not with the true values of statistics of the probability distribution, such as expectation, but with their unbiased estimators. We say that $$\hat{\theta} \left( S_X \right)$$ is an unbiased estimator of some statistic $$\theta$$ calculated using a sample $$S_X$$ drawn from an unknown probability distribution if $$\hat{\theta} \left( S_X \right)$$ has the following property: $$\mathbb E \left[ \hat{\theta} \left( S_X \right) \right] = \theta,$$ where $$\hat{\theta}$$ is a sample statistic, obtained using a sample $$S_X$$ and not the real statistic $$\theta$$ that can be obtained only knowing $$X$$; the expectation is taken over all possible samples drawn from $$X$$ . Intuitively, this means that if you can have an unlimited number of such samples as $$S_X$$, and you compute some unbiased estimator, such as $$\hat{\mu}$$ , using each sample, then the average of all these $$\hat{\mu}$$ equals the real statistic $$\mu$$ that you would get computed on $$X$$.

#### pdf

cannot see any pdfs

#### Annotation 4764500626700

 #MLBook #machine-learning #sample-mean It can be shown that an unbiased estimator of an unknown $$\mathbb E \left[ X \right]$$] (given by either eq. 1 or eq. 2) is given by $$\frac{1}{N} \sum_{i=1}^N x_i$$ (called in statistics the sample mean).

#### pdf

cannot see any pdfs

#### Annotation 4764502985996

 #Bayes-rule #Bayes-theorem #MLBook #machine-learning The conditional probability $$\textrm{Pr} \left( X=x \vert Y=y \right)$$ is the probability of the random variable $$X$$ to have a specific value $$x$$ given that another random variable $$Y$$ has a specific value of $$y$$. The Bayes’ Rule (also known as the Bayes’ Theorem) stipulates that: $$\textrm{Pr} \left( X=x \vert Y=y \right) = \displaystyle \frac{\textrm{Pr} \left( Y=y \vert X=x \right) \textrm{Pr} \left( X=x \right)}{\textrm{Pr} \left( Y=y \right)}$$.

#### pdf

cannot see any pdfs

#### Annotation 4764513733900

 #MLBook #machine-learning #review 2.5 Parameter Estimation

#### pdf

cannot see any pdfs

#### Annotation 4764516093196

 #MLBook #hyperparameter #machine-learning A hyperparameter is a property of a learning algorithm, usually (but not always) having a numerical value. That value influences the way the algorithm works. Hyperparameters aren’t learned by the algorithm itself from data. They have to be set by the data analyst before running the algorithm. I show how to do that in Chapter 5.

#### pdf

cannot see any pdfs

#### Annotation 4764518452492

 #MLBook #machine-learning #parameters Parameters are variables that define the model learned by the learning algorithm. Parameters are directly modified by the learning algorithm based on the training data. The goal of learning is to find such values of parameters that make the model optimal in a certain sense.

#### pdf

cannot see any pdfs

#### Annotation 4764520811788

 #MLBook #classification #label #machine-learning #unlabeled-example Classification is a problem of automatically assigning a label to an unlabeled example. Spam detection is a famous example of classification.

#### pdf

cannot see any pdfs

#### Annotation 4764523171084

 #MLBook #classification-learning-algorithm #labeled-examples #machine-learning #model In machine learning, the classification problem is solved by a classification learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and either directly output a label or output a number that can be used by the analyst to deduce the label. An example of such a number is a probability.

#### pdf

cannot see any pdfs

#### Annotation 4764525530380

 #MLBook #binary-classification #binomial-classification #classes #machine-learning #multiclass-classification #multinomial-classification In a classification problem, a label is a member of a finite set of classes. If the size of the set of classes is two (“sick”/“healthy”, “spam”/“not_spam”), we talk about binary classification (also called binomial in some sources). Multiclass classification (also called multinomial) is a classification problem with three or more classes.

#### pdf

cannot see any pdfs

#### Annotation 4764528413964

 #MLBook #machine-learning #regression #target Regression is a problem of predicting a real-valued label (often called a target) given an unlabeled example. Estimating house price valuation based on house features, such as area, the number of bedrooms, location and so on is a famous example of regression.

#### pdf

cannot see any pdfs

#### Annotation 4764530773260

 #MLBook #machine-learning #regression-learning-algorithm The regression problem is solved by a regression learning algorithm that takes a collection of labeled examples as inputs and produces a model that can take an unlabeled example as input and output a target.

#### pdf

cannot see any pdfs

#### Annotation 4764533132556

 #MLBook #machine-learning #model-based-learning #model-parameters Most supervised learning algorithms are model-based. We have already seen one such algorithm: SVM. Model-based learning algorithms use the training data to create a model that has parameters learned from the training data. In SVM, the two parameters we saw were $$\mathbf w^\ast$$ and $$b^\ast$$ . After the model was built, the training data can be discarded.

#### pdf

cannot see any pdfs

#### Annotation 4764536016140

 #MLBook #instance-based #k-nearest-neighbors #kNN #learning #machine-learning Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is k-Nearest Neighbors (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.

#### pdf

cannot see any pdfs

#### Annotation 4764539424012

 #MLBook #deep-learning #deep-neural-networks #layer #machine-learning #neural-network #shallow-learning A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

#### pdf

cannot see any pdfs

#### Annotation 4764615707916

 #Clinique #EBM #Médecine #Sémiologie Sensitivity is the proportion of patients with the diagnosis who have the physical sign (i.e., have the positive result). Specificity is the proportion of patients without the diagnosis who lack the physical sign (i.e., have the negative result)

#### pdf

cannot see any pdfs

#### Annotation 4768344444172

 The two ovaries contain thousands of follicles, each with an oocyte surrounded by a layer of granulosa cells and thecal cells. These supporting cells produce steroids and paracrine products important in follicular maturation and the coordination of events in reproduction

#### pdf

cannot see any pdfs

#### Annotation 4768349424908

 Until week 8 of gestation, the sex of the embryo cannot be determined morphologically; therefore, this period is termed the indifferent phase of sexual development. After this time, differentiation of the internal and external genitalia occurs, determining the phenotypic sex of the individual, which becomes fully developed after puberty

#### pdf

cannot see any pdfs

#### Annotation 4768350473484

 After 8 weeks of gestation, the production of anti-müllerian hormone by Sertoli cells in the fetal testes leads to regression of the müllerian ducts, whereas production of testosterone by the Leydig cells leads to the persistence of the wolffian duct and the subsequent development of the prostate, epididymis, and seminal vesicles. In the absence of these secretions, female internal reproductive organs are formed from the müllerian ducts, and the wolffian structures degenerate.

#### pdf

cannot see any pdfs

#### Annotation 4768355192076

 #Médecine #Pathophysiology-Of-Disease #Physiologie During female development, the female ovaries contain about 7 million oogonia by 24 weeks of gestation. The majority of these cells die during intrauterine life, leaving only about 1 million primary oocytes at birth. This decreases to about 400,000 by puberty

#### pdf

cannot see any pdfs

#### Annotation 4768356502796

 #Médecine #Pathophysiology-Of-Disease #Physiologie The surviving oogonia are arrested at the prophase of meiosis I. Completion of the first meiotic division does not occur until the time of ovulation, and the second meiosis is completed with fertilization. Only about 400 of these oocytes mature and are released by ovulation during a woman’s lifetime; the others undergo atresia at various stages of development

#### pdf

cannot see any pdfs

#### Annotation 4768359124236

 #Médecine #Pathophysiology-Of-Disease #Physiologie The changes that occur in the brain and hypothalamus that initiate the onset of puberty involve, first, the establishment of sleep-dependent and, later, the truly pulsatile release of gonadotropin-releasing hormone (GnRH) from the hypothalamus. The hypothalamic kisspeptin/GPR54 ligand/receptor pair appears to be the key mediator of the onset of puberty

#### pdf

cannot see any pdfs

#### Annotation 4768360172812

 #Médecine #Pathophysiology-Of-Disease #Physiologie Before about age 10 years in girls, gonadotropin secretion is at low levels and does not display a pulsatile character. After this age, the pulsatile release of GnRH begins and initiates folliculogenesis, leading to cyclic changes in estrogen and progesterone production. These changes allow estrogen-dependent tissues, such as the breasts and the endometrium, to begin their maturation. The appearance of breast development is referred to as thelarche, and the first menstrual period is termed menarche. CHECKPOINT 4.

#### pdf

cannot see any pdfs

#### Annotation 4768362269964

 #Médecine #Pathophysiology-Of-Disease #Physiologie The menstrual cycle has three phases. The follicular phase typically lasts 12– 14 days and culminates in the production of a mature oocyte. Initially, a cohort of follicles begins to grow, but ultimately a single dominant follicle is selected, and the rest undergo a process of degeneration and apoptotic death, termed atresia (Figure 22–5). The follicular phase is followed by ovulation, in which the dominant follicle releases its mature oocyte to be transported through the uterine tubes for fertilization and subsequent implantation in a receptive uterus. The third, luteal, phase also averages 14 days and is characterized by luteinization of the ruptured follicle to produce the corpus luteum

#### pdf

cannot see any pdfs

#### Annotation 4768364629260

 #Médecine #Pathophysiology-Of-Disease #Physiologie Neurons within the hypothalamus synthesize the peptide GnRH, and its secretion is modulated by endogenous opioids and corticotropin-releasing hormone (CRH). GnRH is secreted directly into the portal circulation of the pituitary in a pulsatile fashion. This pulsatility is required for proper activation of its receptor located on the gonadotropes, which are cells located in the anterior pituitary. In response, the gonadotropes secrete the polypeptides FSH and LH, collectively called gonadotropins, which stimulate the ovary to produce estrogen and inhibin. Inhibin feeds back to suppress FSH secretion but has no effect on LH. Estrogen also affects the pituitary by increasing the number of GnRH receptors and its sensitivity to GnRH stimulation. With estradiol production by the ovaries, a critical concentration is reached for a sufficient time to induce a midcycle LH surge and subsequent ovulation. After this surge, high levels of progesterone produced by the corpus luteum suppress gonadotropin release for the duration of the luteal phase

#### pdf

cannot see any pdfs

#### Annotation 4768366726412

 #Médecine #Pathophysiology-Of-Disease #Physiologie Activin acts in the ovary to augment the effect of FSH, increasing aromatase activity and increasing the production of FSH and LH receptors

#### pdf

cannot see any pdfs

#### Annotation 4768367774988

 #Médecine #Pathophysiology-Of-Disease #Physiologie During the early follicular phase, FSH stimulates the growth of a cohort of follicles and increases the production of inhibin and activin in granulosa cells

#### pdf

cannot see any pdfs

#### Annotation 4768369872140

 #Médecine #Pathophysiology-Of-Disease #Physiologie LH stimulates the production of androgens in the thecal cells, which is augmented by inhibin. Androgens diffuse into the granulosa cells to be converted to estrogens through the enzymatic reaction of aromatization.

#### pdf

cannot see any pdfs

#### Annotation 4768371445004

 #Médecine #Pathophysiology-Of-Disease #Physiologie The midcycle LH surge triggers the final steps of oocyte maturation and the resumption of meiosis within the dominant oocyte.

#### pdf

cannot see any pdfs

#### Annotation 4768372493580

 #Médecine #Pathophysiology-Of-Disease #Physiologie Continued secretion from the corpus luteum requires LH (or human chorionic gonadotropin [hCG], as discussed below) stimulation; in its absence, degeneration occurs

#### pdf

cannot see any pdfs

#### Annotation 4768373542156

 #Médecine #Pathophysiology-Of-Disease #Physiologie During the follicular phase, the endometrium proliferates under the influence of estrogen, creating straight glands with thin secretions and microvascular proliferation. During the luteal phase, the high levels of estradiol and progesterone promote the maturation of the endometrium, which develops tortuous glands engorged with thick secretions and proteins (see Figure 22–2). Additionally, the endometrium secretes a number of endocrine and paracrine factors (Table 22–1). These changes optimize the environment for implantation. In the absence of pregnancy, the corpus luteum cannot sustain the high levels of progesterone production, and the endometrial vasculature cannot be maintained. This leads to a sloughing of the endometrium and the onset of menstruation, which is marked by the nadir of estradiol and progesterone levels, ending the cycle

#### pdf

cannot see any pdfs

#### Annotation 4768374590732

 #Médecine #Pathophysiology-Of-Disease #Physiologie Most preparations of estrogen and progestin block the LH surge at midcycle, thereby preventing ovulation. However, other contraceptive actions include effects on estrogen- and progesterone-sensitive tissues, such as inducing antifertility changes in cervical mucus and the endometrial lining that are unfavorable to sperm transport and embryonic implantation, respectively.