on 07-Feb-2020 (Fri)

Annotation 4763933609228

 #MLBook #expectation #expected-value #machine-learning #statistics Let a discrete random variable $$X$$ have $$k$$ possible values $$\{ x_i \}_{i=1}^k$$. The expectation of $$X$$ denoted as $$\mathbb E[X]$$ is given by, \begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align} where $$\textrm{Pr} \left( X = x_i \right)$$ is the probability that $$X$$ has the value $$x_i$$ according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter $$\mu$$ . The expectation is one of the most important statistics of a random variable.

pdf

cannot see any pdfs

Annotation 4769785449740

 [unknown IMAGE 4773033413900] #MLBook #binary-classification #has-images #logistic-regression #machine-learning #problem-statement #sigmoid-function #standard-logistic-function In logistic regression, we still want to model $$y_i$$ as a linear function of $$\mathbf x_i$$, however, with a binary $$y_i$$ this is not straightforward. The linear combination of features such as $$\mathbf w \mathbf x_i + b$$ is a function that spans from minus infinity to plus infinity, while $$y_i$$ has only two possible values. At the time where the absence of computers required scientists to perform manual calculations, they were eager to find a linear classification model. They figured out that if we define a negative label as 0 and the positive label as 1, we would just need to find a simple continuous function whose codomain is (0 , 1). In such a case, if the value returned by the model for input $$\mathbf x$$ is closer to 0, then we assign a negative label to $$\mathbf x$$ ; otherwise, the example is labeled as positive. One function that has such a property is the standard logistic function (also known as the sigmoid function): $$f(x) = \displaystyle \frac{1}{1 + e^{-x}}$$, where $$e$$ is the base of the natural logarithm (also called Euler’s number; $$e^x$$ is also known as the $$exp(x)$$ function in programming languages). Its graph is depicted in Figure 3. The logistic regression model looks like this: $$f_{\mathbf w, b} (\mathbf x) \stackrel{\textrm{def}}{=} \displaystyle \frac{1}{1 + e^{-(\mathbf w \mathbf x + b)}} \quad (3)$$ You can see the familiar term $$\mathbf w \mathbf x + b$$ from linear regression. By looking at the graph of the standard logistic function, we can see how well it fits our classification purpose: if we optimize the values of $$\mathbf w$$ and $$b$$ appropriately, we could interpret the output of $$f( \mathbf x )$$ as the probability of $$y_i$$ being positive. For example, if it’s higher than or equal to the threshold 0.5 we would say that the class of $$\mathbf x$$ is positive; otherwise, it’s negative. In practice, the choice of the threshold could be different depending on the problem. We return to this discussion in Chapter 5 when we talk about model performance assessment. Now, how do we find optimal $$\mathbf w^\ast$$ and $$b^\ast$$? In linear regression, we minimized the empirical risk which was defined as the average squared error loss, also known as the mean squared error or MSE.

pdf

cannot see any pdfs

Annotation 4773346675980

 [unknown IMAGE 4773337763084] #MLBook #hard-margin-SVM #has-images #hinge-loss #machine-learning #noise #soft-margin-SVM #support-vector-machine To extend SVM to cases in which the data is not linearly separable, we introduce the hinge loss function: $$\max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$. The hinge loss function is zero if the constraints in 8 [i.e., $$\mathbf w \mathbf x_i − b \ge +1 \; \textrm{if} \; y_i = +1$$ and $$\mathbf w \mathbf x_i − b \le -1 \; \textrm{if} \; y_i = -1$$] are satisfied; in other words, if $$\mathbf w \mathbf x_i$$ lies on the correct side of the decision boundary. For data on the wrong side of the decision boundary, the function’s value is proportional to the distance from the decision boundary. We then wish to minimize the following cost function, $$C \left\Vert \mathbf w \right\Vert^2 + \frac{1}{N} \displaystyle \sum_{i=1}^N \max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$, where the hyperparameter $$C$$ determines the tradeoff between increasing the size of the decision boundary and ensuring that each $$\mathbf x_i$$ lies on the correct side of the decision boundary. The value of $$C$$ is usually chosen experimentally, just like ID3’s hyperparameters $$\epsilon$$ and $$d$$ . SVMs that optimize hinge loss are called soft-margin SVMs, while the original formulation is referred to as a hard-margin SVM. As you can see, for sufficiently high values of $$C$$, the second term in the cost function will become negligible, so the SVM algorithm will try to find the highest margin by completely ignoring misclassification. As we decrease the value of $$C$$, making classification errors is becoming more costly, so the SVM algorithm tries to make fewer mistakes by sacrificing the margin size. As we have already discussed, a larger margin is better for generalization. Therefore, $$C$$ regulates the tradeoff between classifying the training data well (minimizing empirical risk) and classifying future examples well (generalization).

pdf

cannot see any pdfs

Annotation 4773377608972

 [unknown IMAGE 4773373938956] #MLBook #SVM #has-images #machine-learning #non-linearity SVM can be adapted to work with datasets that cannot be separated by a hyperplane in its original space. Indeed, if we manage to transform the original space into a space of higher dimensionality, we could hope that the examples will become linearly separable in this transformed space. In SVMs, using a function to implicitly transform the original space into a higher dimensional space during the cost function optimization is called the kernel trick. The effect of applying the kernel trick is illustrated in Figure 6. As you can see, it’s possible to transform a two-dimensional non-linearly-separable data into a linearly-separable three-dimensional data using a specific mapping $$\phi: \mathbf x \mapsto \phi (\mathbf x)$$, where $$\phi (\mathbf x)$$ is a vector of higher dimensionality than $$\mathbf x$$. For the example of 2D data in Figure 5 (right), the mapping $$\phi$$ for that projects a 2D example $$\mathbf x = \left[ q, p \right]$$ into a 3D space (Figure 6) would look like this: $$\phi \left( \left[ q, p \right] \right) \stackrel{\textrm{def}}{=} \left( q^2, \sqrt{2} qp, p^2\right)$$, where $$\cdot^2$$ means $$\cdot$$ squared. You see now that the data becomes linearly separable in the transformed space.

pdf

cannot see any pdfs

 #MLBook #RBF-kernel #SVM #kernel-functions #machine-learning #non-linearity However, we don’t know a priori which mapping $$\phi$$ would work for our data. If we first transform all our input examples using some mapping into very high dimensional vectors and then apply SVM to this data, and we try all possible mapping functions, the computation could become very inefficient, and we would never solve our classification problem. Fortunately, scientists figured out how to use kernel functions (or, simply, kernels ) to efficiently work in higher-dimensional spaces without doing this transformation explicitly. To understand how kernels work, we have to see first how the optimization algorithm for SVM finds the optimal values for $$\mathbf x$$ and $$b$$. The method traditionally used to solve the optimization problem in eq. 9 is the method of Lagrange multipliers. Instead of solving the original problem from eq. 9, it is convenient to solve an equivalent problem formulated like this: $$\max_{\alpha_1 \ldots \alpha_N} \displaystyle \sum_{i=1}^N \alpha_i - \frac{1}{2} \sum_{i=1}^N \sum_{k=1}^N y_i \alpha_i (\mathbf x_i \mathbf x_k) y_k \alpha_k \; \textrm{subject to} \; \sum_{i=1}^N \alpha_i y_i \; \textrm{and} \; \alpha_i \ge 0, i = 1, \ldots, N,$$ where $$\alpha_i$$ are called Lagrange multipliers. When formulated like this, the optimization problem becomes a convex quadratic optimization problem, efficiently solvable by quadratic programming algorithms. Now, you could have noticed that in the above formulation, there is a term $$\mathbf x_i \mathbf x_k$$ , and this is the only place where the feature vectors are used. If we want to transform our vector space into higher dimensional space, we need to transform $$\mathbf x_i$$ into $$\phi ( \mathbf x_i )$$ and $$\mathbf x_k$$ into $$\phi ( \mathbf x_k )$$ and then multiply $$\phi ( \mathbf x_i )$$ and $$\phi ( \mathbf x_k )$$. Doing so would be very costly. On the other hand, we are only interested in the result of the dot-product $$\mathbf x_i \mathbf x_k$$, which, as we know, is a real number. We don’t care how this number was obtained as long as it’s correct. By using the kernel trick, we can get rid of a costly transformation of original feature vectors into higher-dimensional vectors and avoid computing their dot-product. We replace that by a simple operation on the original feature vectors that gives the same result. For example, instead of transforming $$( q_1, p_1 )$$ into $$( q_1^2, \sqrt{2} q_1 p_1, p_1^2 )$$ and $$( q_2, p_2 )$$ into $$( q_2^2, \sqrt{2} q_2 p_2, p_2^2 )$$ and then computing the dot-product of $$( q_1^2, \sqrt{2} q_1 p_1, p_1^2 )$$ and $$( q_2^2, \sqrt{2} q_2 p_2, p_2^2 )$$ to obtain $$( q_1^2 q_2^2 + 2 q_1 q_2 p_1 p_2 + p_1^2 p_2^2 )$$ we could find the dot-product between $$( q_1, p_1 )$$ and $$( q_2, p_2 )$$ to get $$( q_1 q_2 + p_1 p_2 )$$ and then square it to get exactly the same result $$( q_1^2 q_2^2 + 2 q_1 q_2 p_1 p_2 + p_1^2 p_2^2 )$$. That was an example of the kernel trick, and we used the quadratic kernel $$k ( \mathbf x_i, \mathbf x_k ) \stackrel{\textrm{def}}{=} ( \mathbf x_i, \mathbf x_k )^2$$. Multiple kernel functions exist, the most widely used of which is the RBF kernel: $$k ( \mathbf x, \mathbf x' )= \exp \left( - \frac{\left\Ver... status not read pdf cannot see any pdfs Flashcard 4789229718796 Tags #MLBook #binary-classification #has-images #logistic-regression #machine-learning #problem-statement #sigmoid-function #standard-logistic-function Question State the problem in logistic regression. [unknown IMAGE 4773033413900] Answer In logistic regression, we still want to model \(y_i$$ as a linear function of $$\mathbf x_i$$, however, with a binary $$y_i$$ this is not straightforward. The linear combination of features such as $$\mathbf w \mathbf x_i + b$$ is a function that spans from minus infinity to plus infinity, while $$y_i$$ has only two possible values.

At the time where the absence of computers required scientists to perform manual calculations, they were eager to find a linear classification model. They figured out that if we define a negative label as 0 and the positive label as 1, we would just need to find a simple continuous function whose codomain is (0 , 1). In such a case, if the value returned by the model for input $$\mathbf x$$ is closer to 0, then we assign a negative label to $$\mathbf x$$ ; otherwise, the example is labeled as positive. One function that has such a property is the standard logistic function (also known as the sigmoid function):

$$f(x) = \displaystyle \frac{1}{1 + e^{-x}}$$,

where $$e$$ is the base of the natural logarithm (also called Euler’s number; $$e^x$$ is also known as the $$exp(x)$$ function in programming languages). Its graph is depicted in Figure 3.

The logistic regression model looks like this:
$$f_{\mathbf w, b} (\mathbf x) \stackrel{\textrm{def}}{=} \displaystyle \frac{1}{1 + e^{-(\mathbf w \mathbf x + b)}} \quad (3)$$

You can see the familiar term $$\mathbf w \mathbf x + b$$ from linear regression.

By looking at the graph of the standard logistic function, we can see how well it fits our classification purpose: if we optimize the values of $$\mathbf w$$ and $$b$$ appropriately, we could interpret the output of $$f( \mathbf x )$$ as the probability of $$y_i$$ being positive. For example, if it’s higher than or equal to the threshold 0.5 we would say that the class of $$\mathbf x$$ is positive; otherwise, it’s negative. In practice, the choice of the threshold could be different depending on the problem. We return to this discussion in Chapter 5 when we talk about model performance assessment.

Now, how do we find optimal $$\mathbf w^\ast$$ and $$b^\ast$$? In linear regression, we minimized the empirical risk which was defined as the average squared error loss, also known as the mean squared error or MSE.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In logistic regression, we still want to model $$y_i$$ as a linear function of $$\mathbf x_i$$, however, with a binary $$y_i$$ this is not straightforward. The linear combination of features such as $$\mathbf w \mathbf x_i + b$$ is a function that spans from minus infinity to plus infinity, while $$y_i$$ has only two possible values. At the time where the absence of computers required scientists to perform manual calculations, they were eager to find a linear classification model. They figured out that if we define a negative label as 0 and the positive label as 1, we would just need to find a simple continuous function whose codomain is (0 , 1). In such a case, if the value returned by the model for input $$\mathbf x$$ is closer to 0, then we assign a negative label to $$\mathbf x$$ ; otherwise, the example is labeled as positive. One function that has such a property is the standard logistic function (also known as the sigmoid function): $$f(x) = \displaystyle \frac{1}{1 + e^{-x}}$$, where $$e$$ is the base of the natural logarithm (also called Euler’s number; $$e^x$$ is also known as the $$exp(x)$$ function in programming languages). Its graph is depicted in Figure 3. The logistic regression model looks like this: $$f_{\mathbf w, b} (x) \stackrel{\textrm{def}}{=} \displaystyle \frac{1}{1 + e^{-(\mathbf w \mathbf x + b)}} \quad (3)$$ You can see the familiar term $$\mathbf w \mathbf x + b$$ from linear regression. By looking at the graph of the standard logistic function, we can see how well it fits our classification purpose: if we optimize the values of $$\mathbf w$$ and $$b$$ appropriately, we could interpret the output of $$f( \mathbf x )$$ as the probability of $$y_i$$ being positive. For example, if it’s higher than or equal to the threshold 0.5 we would say that the class of $$\mathbf x$$ is positive; otherwise, it’s negative. In practice, the choice of the threshold could be different depending on the problem. We return to this discussion in Chapter 5 when we talk about model performance assessment. Now, how do we find optimal $$\mathbf w^\ast$$ and $$b^\ast$$? In linear regression, we minimized the empirical risk which was defined as the average squared error loss, also known as the mean squared error or MSE.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4789254098188

Tags
#MLBook #SVM #has-images #linear-regression #machine-learning
Question
Compare the SVM and linear regression models.
[unknown IMAGE 4769622658316]

You could have noticed that the form of our linear model in eq. 1 $$\left[ f_{\mathbf w,b} (\mathbf x) = \mathbf w \mathbf x + b \right]$$ is very similar to the form of the SVM model. The only difference is the missing sign operator. The two models are indeed similar. However, the hyperplane in the SVM plays the role of the decision boundary: it’s used to separate two groups of examples from one another. As such, it has to be as far from each group as possible.

On the other hand, the hyperplane in linear regression is chosen to be as close to all training examples as possible.

You can see why this latter requirement is essential by looking at the illustration in Figure 1. It displays the regression line (in red) for one-dimensional examples (blue dots). We can use this line to predict the value of the target $$y$$ new for a new unlabeled input example $$x_{new}$$ new . If our examples are $$D$$-dimensional feature vectors (for $$D > 1$$), the only difference with the one-dimensional case is that the regression model is not a line but a plane (for two dimensions) or a hyperplane (for $$D > 2$$).

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
You could have noticed that the form of our linear model in eq. 1 $$\left[ f_{\mathbf w,b} (\mathbf x) = \mathbf w \mathbf x + b \right]$$ is very similar to the form of the SVM model. The only difference is the missing sign operator. The two models are indeed similar. However, the hyperplane in the SVM plays the role of the decision boundary: it’s used to separate two groups of examples from one another. As such, it has to be as far from each group as possible. On the other hand, the hyperplane in linear regression is chosen to be as close to all training examples as possible. You can see why this latter requirement is essential by looking at the illustration in Figure 1. It displays the regression line (in red) for one-dimensional examples (blue dots). We can use this line to predict the value of the target $$y$$ new for a new unlabeled input example $$x_{new}$$ new . If our examples are $$D$$-dimensional feature vectors (for $$D > 1$$), the only difference with the one-dimensional case is that the regression model is not a line but a plane (for two dimensions) or a hyperplane (for $$D > 2$$).

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4789274545420

Tags
#MLBook #has-images #linear-regression #machine-learning #overfitting
Question
Discuss about overfitting in linear regression.
[unknown IMAGE 4789270351116]
One practical justification of the choice of the linear form for the model is that it’s simple. Why use a complex model when you can use a simple one? Another consideration is that linear models rarely overfit. Overfitting is the property of a model such that the model predicts very well labels of the examples used during training but frequently makes errors when applied to examples that weren’t seen by the learning algorithm during training. An example of overfitting in regression is shown in Figure 2. The data used to build the red regression line is the same as in Figure 1. The difference is that this time, this is the polynomial regression with a polynomial of degree 10. The regression line predicts almost perfectly the targets almost all training examples, but will likely make significant errors on new data, as you can see in Figure 1 for $$x_{new}$$ . We talk more about overfitting and how to avoid it Chapter 5.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
One practical justification of the choice of the linear form for the model is that it’s simple. Why use a complex model when you can use a simple one? Another consideration is that linear models rarely overfit. Overfitting is the property of a model such that the model predicts very well labels of the examples used during training but frequently makes errors when applied to examples that weren’t seen by the learning algorithm during training. An example of overfitting in regression is shown in Figure 2. The data used to build the red regression line is the same as in Figure 1. The difference is that this time, this is the polynomial regression with a polynomial of degree 10. The regression line predicts almost perfectly the targets almost all training examples, but will likely make significant errors on new data, as you can see in Figure 1 for $$x_{new}$$ . We talk more about overfitting and how to avoid it Chapter 5.

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4818832854284

 #MLBook #cosine-similarity #k-nearest-neighbors #kNN #machine-learning k-Nearest Neighbors (kNN) is a non-parametric learning algorithm. Contrary to other learning algorithms that allow discarding the training data after the model is built, kNN keeps all training examples in memory. Once a new, previously unseen example $$\mathbf x$$ comes in, the kNN algorithm finds $$k$$ training examples closest to $$\mathbf x$$ and returns the majority label, in case of classification, or the average label, in case of regression. The closeness of two examples is given by a distance function. For example, Euclidean distance seen above is frequently used in practice. Another popular choice of the distance function is the negative cosine similarity. Cosine similarity defined as, $$s \left( \mathbf x_i, \mathbf x_k \right) \stackrel{\textrm{def}}{=} \cos \left( \angle \left( \mathbf x_i, \mathbf x_k \right) \right) = \frac{\sum_{j = 1}^D x_i^{(j)} x_k^{(j)}}{\sqrt{\sum_{j=1}^D \left( x_i^{(j)}\right)^2} \sqrt{\sum_{j=1}^D \left( x_k^{(j)}\right)^2}}$$, is a measure of similarity of the directions of two vectors. If the angle between two vectors is 0 degrees, then two vectors point to the same direction, and cosine similarity is equal to 1. If the vectors are orthogonal, the cosine similarity is 0. For vectors pointing in opposite directions, the cosine similarity is − 1. If we want to use cosine similarity as a distance metric, we need to multiply it by −1. Other popular distance metrics include Chebychev distance, Mahalanobis distance, and Hamming distance. The choice of the distance metric, as well as the value for $$k$$, are the choices the analyst makes before running the algorithm. So these are hyperparameters. The distance metric could also be learned from data (as opposed to guessing it). We talk about that in Chapter 10.

pdf

cannot see any pdfs

Annotation 4822519385356

 #L1-regularization #MLBook #hyperparameter #machine-learning Recall the linear regression objective: $$\displaystyle \min_{\mathbf w, b} \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2. \tag{2}$$ An L1-regularized objective looks like this: $$\displaystyle \min_{\mathbf w, b} \left[ C \left\vert \mathbf w \right\vert + \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2 \right], \tag{3}$$ where $$\left\vert \mathbf w \right\vert \stackrel{\textrm{def}}{=} \sum_{j=1}^D \left\vert w^{(j)} \right\vert$$ and $$C$$ is a hyperparameter that controls the importance of regularization. If we set $$C$$ to zero, the model becomes a standard non-regularized linear regression model. On the other hand, if we set to $$C$$ to a high value, the learning algorithm will try to set most $$w^{(j)}$$ to a very small value or zero to minimize the objective, the model will become very simple which can lead to underfitting. Your role as the data analyst is to find such a value of the hyperparameter $$C$$ that doesn’t increase the bias too much but reduces the variance to a level reasonable for the problem at hand.

pdf

cannot see any pdfs

Annotation 4839874628876

 #knowledge-base-construction #machine-learning Apache Spark allows Snorkel pro- cesses to be distributed to many nodes, thus reducing the time for learning

pdf

cannot see any pdfs

Annotation 4846372130060

 #knowledge-base-construction #machine-learning #unfinished Before attention, previous work explored using pooling strategies to train an RNN, such as max pooling [ 41 ]. Max pooling compresses the informa- tion contained in potentially long input sequences to a fixed-length internal representation by considering all parts of the input sequence impartially. Compression of information can make it difficult for RNNs to learn from long input sequences

pdf

cannot see any pdfs

Annotation 4846398606604

 #knowledge-base-construction #machine-learning #unfinished Fonduer: we associate the multimodal information in the converted PDF with all extracted words.

pdf

cannot see any pdfs

Annotation 4846400179468

 #knowledge-base-construction #machine-learning #unfinished Fonduer aligns the word sequences of the converted PDFs with their original files by checking if both their characters and number of repeated occurrences before the current word are the same.

pdf

cannot see any pdfs

Annotation 4861081292044

 #machine-learning #software-engineering #unfinished Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output. Managing jungle-like data-preparation pipelines, detecting errors and recovering from failures are all difficult and costly [1]. Testing jungle-like data prepartion pipelines often requires expensive end-to-end integration tests. If testing, detecting errors and recovering from failures are difficult and costly, they add to technical debt of a system and make further innovation more costly.

pdf

cannot see any pdfs

Annotation 4864528223500

 #machine-learning #software-engineering #unfinished Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is critical.

pdf

cannot see any pdfs

Flashcard 4871284460812

Tags
#knowledge-base-construction #machine-learning #unfinished
Question
In Fonduer, lightweight supervision rules capture a user’s do- main knowledge and [...] which are used for training Fonduer ’s deep-learning model (see Section 4.3).
programmatically label subsets of candidates,

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
nt on feature engineering. Users only need to specify candidates, the potential entries in the target KB, and provide lightweight supervision rules which capture a user’s do- main knowledge and <span>programmatically label subsets of candidates, which are used for training Fonduer ’s deep-learning model (see Section 4.3). <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4871723027724

 In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples of artifacts would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts. In software, like an archaeological site, any thing that is created could be an artifact.

terminology - What does artifact mean? - Software Engineering Stack Exchange
can call anything produced or created while programming or upon execution, an artifact. – TheLegendaryCopyCoder Jul 21 '17 at 9:35 add a comment | 7 Answers 7 active oldest votes 66 [emptylink] <span>In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts, ... like an archaeological site, any thing that is created could be an artifact. In most software development cycles, there's usually a list of specific required artifacts that someone must produce and put on a shared drive or document repository for other people to

Annotation 4884582763788

 #bert #knowledge-base-construction #nlp #unfinished In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

pdf

cannot see any pdfs

Flashcard 4884627590412

Question

In this sentence, what does "promote" mean?

All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

The act of copying file content from a less controlled location into a more controlled location.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production.

Original toplevel document

Sato,Wider,Windheuser_2019_Continuous-delivery_thoughtworks
icient collaboration and alignment. However, this integration also brings new challenges when compared to traditional software development. These include: A higher number of changing artifacts. <span>Not only do we have to manage the software code artifacts but also the data sets, the machine learning models, and the parameters and hyperparameters used by such models. All these artifacts have to be managed, versioned and promoted through different stages until they’re deployed to production. It’s harder to achieve versioning, quality control, reliability, repeatability and audibility in that process. Size and portability: Training data and machine learning models usually co

Flashcard 4920387964172

Tags
#knowledge-base-construction #machine-learning #unfinished
Question
Fonduer: In pro- duction, [...] are applied to the entire set of candidates, and learning and inference are performed only once to generate the final KB
the finalized LFs

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In practice, ap- proximately 20 iterations are adequate for our users to generate a sufficiently tuned set of labeling functions (see Section 6). In pro- duction, the finalized LFs are applied to the entire set of candidates, and learning and inference are performed only once to generate the final KB

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4920572251404

 #275 #Cours #Facultaires #Ictère #Médecine #Néonatal #Pédiatrie En cas d'échec de la photothérapie, on peut avoir recours à l'exsanguinotransfusion, dont les indications sont devenues rares. Des perfusions d'albumine peuvent être utilisées chez des enfants vulnérables (hypotrophie, prématurité, acidose, déshydratation, polymédication pouvant interférer avec la liaison bilirubine-albumine) ou dans l'attente de la réalisation d'une exsanguinotransfusion. Les immunoglobulines polyvalentes IV sont recommandées comme adjuvant à la photothérapie intensive en cas d'ictère lié à une incompatibilité maternofœtale rhésus ou ABO documentée

pdf

cannot see any pdfs

Annotation 4920692575500

 #340 #Cours #Facultaires #Inattendue #MIN #Mort #Médecine #Nourisson #Pédiatrie MIN Examen clinique de l'enfant : Il s'attache notamment à apprécier les éléments suivants : T°C rectaleTension de la fontanelle, signes de déshydratation et/ou de dénutrition sévèreAspect du siège, coloration des téguments, lividités, étendue de la rigiditéTraces cutanées et/ou muqueuses (éruptions cutanées, ecchymoses, hématomes, autres lésions traumatiques, cicatrices). Cet examen clinique doit être le plus précoce possible et réalisé sur un enfant totalement déshabillé. Il peut se faire en présence des parents s'ils le souhaitent

pdf

cannot see any pdfs

Annotation 4921183309068

 A workflow engine is a software application that manages business processes. Workflow engines typically make use of a database server.

Workflow engine - Wikipedia

Annotation 4921271127308

 #machine-learning #software-engineering #unfinished The experimental paradigm in machine learning is reaching its limits. This is challenging the speed of scientific progress in the area.

pdf

cannot see any pdfs

Annotation 4955259145484

 [unknown IMAGE 4955290078476] #43 #Cours #Facultaires #Médecine #Pédiatrie #Trisomie #has-images Trisomie 21 : Dysmorphie craniofaciale : Microcéphalie modérée (autour de – 2 DS)Occiput plat, nuque courte et large (avec en période néonatale un excès de peau)Visage rond et platPetites oreilles rondes mal ourléesHypertélorisme (distance excessive entre les orbites)Fentes palpébrales obliques en haut et en dehors, avec un épicanthus (insertion de la paupière supérieure formant un repli recouvrant le canthus interne)Nez court par hypoplasie des os propres du nez, avec ensellure nasale plate (contribuant à l'épicanthus)Petite bouche, souvent tenue ouverte (du fait de l'hypotonie faciale)Langue protruse donnant une impression de macroglossieMâchoire inférieure devenant prognathe avec l'âge

pdf

cannot see any pdfs

Annotation 4956483357964

 #205 #43 #BPCO #Cours #Facultaires #Mucoviscidose #Médecine #Pédiatrie Mucoviscidose Le lait ayant un apport protidique et sodé insuffisant, il est nécessaire d'apporter du sel de manière systématique chez le nourrisson avant la diversification (environ 2 mEq/kg par jour en plus du lait). L'apport adéquat peut être contrôlé par ionogramme urinaire

pdf

cannot see any pdfs

Annotation 4962452901132

 What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, its "core principles" will organically happen anyway

The Failure of Agile : programming
lmost anything can be considered Agile. Yet most "agile experts" still manage to violate the core principles. Continue this thread level 2 Tech_Itch 44 points · 4 years ago · edited 4 years ago <span>What's hilarious to me is that since the Agile manifesto is so vague, you could say that its "core principles" will organically happen in many small shops anyway: Individuals and interactions over Processes and tools: Everyone will insist on using their own tools, and fiercely defend their choice. Much time will be spent in "individual interacti

Annotation 4962504805644

 A few business rules can make developing corporate CRUD apps start feeling like a craft. Sometimes business rules are only in the minds of senior business users and not documented formally.

AGILE must be destroyed, once and for all - Erik Meijer : programming
at 1) there's often no connection between the "product owner" and the user community, so adoption fails, and 2) people don't make rational decisions. level 2 _georgesim_ 12 points · 4 years ago <span>Throw a few business rules in there and then it starts feeling like a craft. Bonus points if the business rules are only in the minds of senior business users and not documented formally. Continue this thread level 2 JBlitzen 1 point · 4 years ago Anyone who's ever used business software knows that the difference between great business software and shitty business softwa

Annotation 4963736358156

 #machine-learning #nlp #unfinished As the gap between the relevant information and the point where it is needed becomes very large, RNNs become unable to learn to connect the information.

Parent (intermediate) annotation

Open it
mation suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for <span>the gap between the relevant information and the point where it is needed to become very large. Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. <span>

Original toplevel document

Olah-2015-Understanding_LSTM_Networks-colah,github,io
derstanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends. Sometimes, we only need to look at recent information to perform the present task. <span>For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information. But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large. Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice

Annotation 4965730225420

 #9 #Certificats #Cours #Facultaires #Légale #Médecine La case de la rubrique « obstacle au don du corps » doit être cochée en cas : D'obstacle médico-légal à l'inhumation De maladie contagieuse. La case de la rubrique « prélèvement en vue de rechercher la cause du décès » doit être cochée en cas de suspicion de maladie contagieuse faisant l'objet des rubriques « cercueil hermétique » et « cercueil simple », à la demande : Du médecin constatant le décèsDu préfet

pdf

cannot see any pdfs

Flashcard 4965946494220

Tags
#MLBook #expectation #expected-value #machine-learning #statistics
Question
Describe the expectation of a discrete random variable.

Let a discrete random variable $$X$$ have $$k$$ possible values $$\{ x_i \}_{i=1}^k$$. The expectation of $$X$$ denoted as $$\mathbb E[X]$$ is given by,

\begin{align} \mathbb E[X] & \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ & = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right) \end{align}

where $$\textrm{Pr} \left( X = x_i \right)$$ is the probability that $$X$$ has the value $$x_i$$ according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter $$\mu$$ . The expectation is one of the most important statistics of a random variable.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Let a discrete random variable $$X$$ have $$k$$ possible values $$\{ x_i \}_{i=1}^k$$. The expectation of $$X$$ denoted as $$\mathbb E[X]$$ is given by, $$\mathbb E[X] \stackrel{\textrm{def}}{=} \sum_{i=1}^k \left[ x_i \cdot \textrm{Pr} \left( X = x_i \right) \right] \\ = x_1 \cdot \textrm{Pr} \left( X = x_1 \right) + x_2 \cdot \textrm{Pr} \left( X = x_2 \right) + \cdots + x_k \cdot \textrm{Pr} \left( X = x_k \right)$$ where $$\textrm{Pr} \left( X = x_i \right)$$ is the probability that $$X$$ has the value $$x_i$$ according to the pmf. The expectation of a random variable is also called the mean, average or expected value and is frequently denoted with the letter $$\mu$$ . The expectation is one of the most important statistics of a random variable.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4966128422156

Tags
#MLBook #SVM #dataset #decision-boundary #has-images #hyperplane #learning-algorithm #machine-learning #margin #model #support-vector-machine #training
Question
Describe how Support Vector Machines work by using a linear model as an example.
[unknown IMAGE 4763872791820]

Let’s say the problem that you want to solve using supervised learning is spam detection. You gather the data, for example, 10,000 email messages, each with a label either “spam” or “not_spam” (you could add those labels manually or pay someone to do that for us). Now, you have to convert each email message into a feature vector.

The data analyst decides, based on their experience, how to convert a real-world entity, such as an email message, into a feature vector. One common way to convert a text into a feature vector, called bag of words, is to take a dictionary of English words (let’s say it contains 20,000 alphabetically sorted words) and stipulate that in our feature vector:

• the first feature is equal to 1 if the email message contains the word “a”; otherwise, this feature is 0;
• the second feature is equal to 1 if the email message contains the word “aaron”; otherwise, this feature equals 0;
• . . .
• the feature at position 20,000 is equal to 1 if the email message contains the word “zulu”; otherwise, this feature is equal to 0.

You repeat the above procedure for every email message in our collection, which gives us 10,000 feature vectors (each vector having the dimensionality of 20,000) and a label (“spam”/“not_spam”).

Now you have a machine-readable input data, but the output labels are still in the form of human-readable text. Some learning algorithms require transforming labels into numbers. For example, some algorithms require numbers like 0 (to represent the label “not_spam”) and 1 (to represent the label “spam”). The algorithm I use to illustrate supervised learning is called Support Vector Machine (SVM). This algorithm requires that the positive label (in our case it’s “spam”) has the numeric value of +1 (one), and the negative label (“not_spam”) has the value of −1 (minus one).

At this point, you have a dataset and a learning algorithm, so you are ready to apply the learning algorithm to the dataset to get the model.

SVM sees every feature vector as a point in a high-dimensional space (in our case, space is 20,000-dimensional). The algorithm puts all feature vectors on an imaginary 20,000-dimensional plot and draws an imaginary 19,999-dimensional line (a hyperplane) that separates examples with positive labels from examples with negative labels. In machine learning, the boundary separating the examples of different classes is called the decision boundary.

The equation of the hyperplane is given by two parameters, a real-valued vector $$\mathbf w$$ of the same dimensionality as our input feature vector $$\mathbf x$$, and a real number $$\mathbf b$$ like this:

$$\mathbf w \mathbf x − b = 0$$,

where the expression $$\mathbf w \mathbf x$$ means $$w^{(1)} x^{(1)} + w^{(2)} x^{(2)} _ \ldots +w^{(D)} x^{(D)}$$, and $$D$$ is the number of dimensions of the feature vector $$\mathbf x$$.

(If some equations aren’t clear to you right now, in Chapter 2 we revisit the math and statistical concepts necessary to understand them. For the moment, try to get an intuition of what’s happening here. It all becomes more clear after you read the next chapter.)

Now, the predicted label for some input feature vector $$\mathbf x$$ is given like this: $$y = \operatorname{sign} \left( \mathbf w \mathbf x − b \right)$$, where sign is a mathematical operator that takes any value as input and returns +1 if the input is a positive number or −1 if the input is a negative number. The goal of the learning algorithm — SVM in this case — is to leverage the dataset and find the optima

...

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Let’s say the problem that you want to solve using supervised learning is spam detection. You gather the data, for example, 10,000 email messages, each with a label either “spam” or “not_spam” (you could add those labels manually or pay someone to do that for us). Now, you have to convert each email message into a feature vector. The data analyst decides, based on their experience, how to convert a real-world entity, such as an email message, into a feature vector. One common way to convert a text into a feature vector, called bag of words, is to take a dictionary of English words (let’s say it contains 20,000 alphabetically sorted words) and stipulate that in our feature vector: the first feature is equal to 1 if the email message contains the word “a”; otherwise, this feature is 0; the second feature is equal to 1 if the email message contains the word “aaron”; otherwise, this feature equals 0; . . . the feature at position 20,000 is equal to 1 if the email message contains the word “zulu”; otherwise, this feature is equal to 0. You repeat the above procedure for every email message in our collection, which gives us 10,000 feature vectors (each vector having the dimensionality of 20,000) and a label (“spam”/“not_spam”). Now you have a machine-readable input data, but the output labels are still in the form of human-readable text. Some learning algorithms require transforming labels into numbers. For example, some algorithms require numbers like 0 (to represent the label “not_spam”) and 1 (to represent the label “spam”). The algorithm I use to illustrate supervised learning is called Support Vector Machine (SVM). This algorithm requires that the positive label (in our case it’s “spam”) has the numeric value of +1 (one), and the negative label (“not_spam”) has the value of −1 (minus one). At this point, you have a dataset and a learning algorithm, so you are ready to apply the learning algorithm to the dataset to get the model. SVM sees every feature vector as a point in a high-dimensional space (in our case, space is 20,000-dimensional). The algorithm puts all feature vectors on an imaginary 20,000-dimensional plot and draws an imaginary 19,999-dimensional line (a hyperplane) that separates examples with positive labels from examples with negative labels. In machine learning, the boundary separating the examples of different classes is called the decision boundary. The equation of the hyperplane is given by two parameters, a real-valued vector $$\mathbf w$$ of the same dimensionality as our input feature vector $$\mathbf x$$, and a real number $$\mathbf b$$ like this: $$\mathbf w \mathbf x − b = 0$$, where the expression $$\mathbf w \mathbf x$$ means $$w^{(1)} x^{(1)} + w^{(2)} x^{(2)} _ \ldots w^{(D)} x^{(D)}$$, and $$D$$ is the number of dimensions of the feature vector $$\mathbf x$$. (If some equations aren’t clear to you right now, in Chapter 2 we revisit the math and statistical concepts necessary to understand them. For the moment, try to get an intuition of what’s happening here. It all becomes more clear after you read the next chapter.) Now, the predicted label for some input feature vector $$\mathbf x$$ is given like this: $$y = \operatorname{sign} \left( \mathbf w \mathbf x − b \right)$$, where sign is a mathematical operator that takes any value as input and returns +1 if the input is a positive number or −1 if the input is a negative number. The goal of the learning algorithm — SVM in this case — is to leverage the dataset and find the optimal values $$\mathbf w^\ast$$ and $$b^\ast$$ for parameters $$\mathbf w$$ and $$b$$ . Once the learning algorithm identifies these optimal values, the model $$f(x)$$ is then defined as: $$f(x) = \operatorname{sign} \left( \mathbf w^\ast \mathbf x − b^\ast \right)$$ Therefore, to predict whether an email message is spam or not spam using an SVM model, you have to take a text of the message, convert it into a feature vector, then multiply this vector by $$\mathbf w^\ast$$, subtract $$b^\ast$$ and take the sign of the result. This will give us the prediction (+1 means “spam”, −1 means “not_spam”). Now, how does the machine find $$\mathbf w^\ast$$ and $$b^\ast$$? It solves an optimization problem. Machines are good at optimizing functions under constraints. So what are the constraints we want to satisfy here? First of all, we want the model to predict the labels of our 10,000 examples correctly. Remember that each example $$i = 1 ,\ldots, 10000$$ is given by a pair $$\left( \mathbf x_i, y_i \right)$$, where $$\mathbf x_i$$ is the feature vector of example $$i$$ and $$y_i$$ is its label that takes values either −1 or +1. So the constraints are naturally: \begin{align} \mathbf w \mathbf x_i − b & \ge +1, \quad \textrm{if} \; y_i = +1, \\ \mathbf w \mathbf x_i − b & \le -1, \quad \textrm{if} \; y_i = -1. \end{align} We would also prefer that the hyperplane separates positive examples from negative ones with the largest margin. The margin is the distance between the closest examples of two classes, as defined by the decision boundary. A large margin contributes to a better generalization, that is how well the model will classify new examples in the future. To achieve that, we need to minimize the Euclidean norm of $$\mathbf w$$ denoted by $$\Vert \mathbf w \Vert$$ and given by $$\sqrt{\sum_{j=1}^D \left( w^{(j)}\right)^2}$$. So, the optimization problem that we want the machine to solve looks like this: Minimize $$\Vert \mathbf w \Vert$$ subject to $$y_i \left( \mathbf w \mathbf x_i − b \right) \ge 1 \; \textrm{for} \; i = 1 , \ldots , N$$ . The expression $$y_i \left( \mathbf w \mathbf x_i − b \right) \ge 1$$ is just a compact way to write the above two constraints. The solution of this optimization problem, given by $$\mathbf w^\ast$$ and $$b^\ast$$, is called the statistical model, or, simply, the model. The process of building the model is called training. For two-dimensional feature vectors, the problem and the solution can be visualized as shown in Figure 1. The blue and orange circles represent, respectively, positive and negative examples, and the line given by $$\mathbf w \mathbf x − b = 0$$ is the decision boundary. Why, by minimizing the norm of $$\mathbf w$$, do we find the highest margin between the two classes? Geometrically, the equations $$\mathbf w \mathbf x − b = 1$$ and $$\mathbf w \mathbf x − b = -1$$ define two parallel hyperplanes, as you see in Figure 1. The distance between these hyperplanes is given by $$2/\Vert \mathbf w \Vert$$ , so the smaller the norm $$\Vert \mathbf w \Vert$$, the larger the distance between these two hyperplanes. That’s how Support Vector Machines work. This particular version of the algorithm builds the so-called linear model. It’s called linear because the decision boundary is a straight line (or a plane, or a hyperplane).

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967535349004

Question
What's hilarious to me is that since t[...], you could say that in many smally shops, its "core principles" will organically happen anyway
he Agile manifesto is so vague

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, its "core principles" will organically happen anyway

Original toplevel document

The Failure of Agile : programming
lmost anything can be considered Agile. Yet most "agile experts" still manage to violate the core principles. Continue this thread level 2 Tech_Itch 44 points · 4 years ago · edited 4 years ago <span>What's hilarious to me is that since the Agile manifesto is so vague, you could say that its "core principles" will organically happen in many small shops anyway: Individuals and interactions over Processes and tools: Everyone will insist on using their own tools, and fiercely defend their choice. Much time will be spent in "individual interacti

Flashcard 4967536921868

Question
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in [...], its "core principles" will organically happen anyway
many smally shops

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, its "core principles" will organically happen anyway

Original toplevel document

The Failure of Agile : programming
lmost anything can be considered Agile. Yet most "agile experts" still manage to violate the core principles. Continue this thread level 2 Tech_Itch 44 points · 4 years ago · edited 4 years ago <span>What's hilarious to me is that since the Agile manifesto is so vague, you could say that its "core principles" will organically happen in many small shops anyway: Individuals and interactions over Processes and tools: Everyone will insist on using their own tools, and fiercely defend their choice. Much time will be spent in "individual interacti

Flashcard 4967540067596

Question
Softmax is implemented [...] just before the output layer.
through a neural network layer

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Softmax is implemented through a neural network layer just before the output layer.

Original toplevel document

Multi-Class Neural Networks: Softmax | Machine Learning Crash Course
analysis we saw in Figure 1, Softmax might produce the following likelihoods of an image belonging to a particular class: Class Probability apple 0.001 bear 0.04 candy 0.008 dog 0.95 egg 0.001 <span>Softmax is implemented through a neural network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer. Figure 2. A Softmax layer within a neural network. Click the plus icon to see the Softmax equation. The Softmax

Flashcard 4967541640460

Question
Softmax is implemented through a neural network layer [...] the output layer.
just before

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Softmax is implemented through a neural network layer just before the output layer.

Original toplevel document

Multi-Class Neural Networks: Softmax | Machine Learning Crash Course
analysis we saw in Figure 1, Softmax might produce the following likelihoods of an image belonging to a particular class: Class Probability apple 0.001 bear 0.04 candy 0.008 dog 0.95 egg 0.001 <span>Softmax is implemented through a neural network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer. Figure 2. A Softmax layer within a neural network. Click the plus icon to see the Softmax equation. The Softmax

Flashcard 4967543999756

Tags
#machine-learning #software-engineering #unfinished
Question
The [...] in machine learning is reaching its limits. This is challenging the speed of scientific progress in the area.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The experimental paradigm in machine learning is reaching its limits. This is challenging the speed of scientific progress in the area.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967545572620

Tags
#machine-learning #software-engineering #unfinished
Question
The experimental paradigm in machine learning is reaching [...].
its limits

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The experimental paradigm in machine learning is reaching its limits. This is challenging the speed of scientific progress in the area.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967547145484

Tags
#machine-learning #software-engineering #unfinished
Question
The experimental paradigm in machine learning is reaching its limits. This is [...]
challenging the speed of scientific progress.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The experimental paradigm in machine learning is reaching its limits. This is challenging the speed of scientific progress in the area.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967550553356

Tags
#machine-learning #nlp #unfinished
Question
As the [...] becomes very large, RNNs become unable to learn to connect the information.
gap between the relevant information and the point where it is needed

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
As the gap between the relevant information and the point where it is needed becomes very large, RNNs become unable to learn to connect the information.

Original toplevel document

Olah-2015-Understanding_LSTM_Networks-colah,github,io
derstanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends. Sometimes, we only need to look at recent information to perform the present task. <span>For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information. But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large. Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice

Flashcard 4967553961228

Tags
#machine-learning #nlp #unfinished
Question
As the gap between the relevant information and the point where it is needed becomes very large, RNNs become [...].
unable to learn to connect the information

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
As the gap between the relevant information and the point where it is needed becomes very large, RNNs become unable to learn to connect the information.

Original toplevel document

Olah-2015-Understanding_LSTM_Networks-colah,github,io
derstanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends. Sometimes, we only need to look at recent information to perform the present task. <span>For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information. But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large. Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice

Flashcard 4967555534092

Tags
#machine-learning #nlp #unfinished
Question
As the gap between the relevant information and the point where it is needed becomes [...], RNNs become unable to learn to connect the information.
very large

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
As the gap between the relevant information and the point where it is needed becomes very large, RNNs become unable to learn to connect the information.

Original toplevel document

Olah-2015-Understanding_LSTM_Networks-colah,github,io
derstanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends. Sometimes, we only need to look at recent information to perform the present task. <span>For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information. But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large. Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice

Flashcard 4967558155532

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the [...] is con- structed by the summation of the corresponding token, segment and position embeddings.
input representation of each token

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is con- structed by the summation of the corresponding token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967561301260

Question
[...] has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.
Perspective

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Flashcard 4967562874124

Question
Perspective has [...] Wikipedia, The New York Times, The Economist, and The Guardian.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Flashcard 4967564446988

Question
Perspective has already partnered with [...], The New York Times, The Economist, and The Guardian.
Wikipedia

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Flashcard 4967566019852

Question
Perspective has already partnered with Wikipedia, [...], The Economist, and The Guardian.
The New York Times

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Flashcard 4967567592716

Question
Perspective has already partnered with Wikipedia, The New York Times, [...], and The Guardian.
The Economist

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Flashcard 4967569165580

Question
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and [...]
The Guardian.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian.

Original toplevel document

How Automated Tools Discriminate Against Black Language – MIT Center for Civic Media
ers at the University of Massachusetts have shown that several popular tools for natural language processing (NLP) tend to perform more poorly on AAVE and even misidentify AAVE as non-English . <span>These biases against AAVE become especially worrisome as more platforms use tools like Perspective to moderate online discussions. Perspective has already partnered with Wikipedia, The New York Times, The Economist, and The Guardian. Meanwhile, social media platforms like Facebook have their own automated tools for content moderation — and an unfortunate track record of disabling the accounts of Black activists while doing little about the accounts of white supremacists. There are well-documented problems of content moderation on social media platforms , but as we work to address these problems, I argue that we have to recognize that platforms can have

Annotation 4967572049164

 #knowledge-base-construction #machine-learning #unfinished Before attention, previous work explored using pooling strategies to train an RNN, such as max pooling [ 41 ].

Parent (intermediate) annotation

Open it
Before attention, previous work explored using pooling strategies to train an RNN, such as max pooling [ 41 ]. Max pooling compresses the informa- tion contained in potentially long input sequences to a fixed-length internal representation by considering all parts of the input sequence impartial

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967573622028

 #knowledge-base-construction #machine-learning #unfinished Max pooling compresses the informa- tion contained in potentially long input sequences to a fixed-length internal representation by considering all parts of the input sequence impartially.

Parent (intermediate) annotation

Open it
Before attention, previous work explored using pooling strategies to train an RNN, such as max pooling [ 41 ]. Max pooling compresses the informa- tion contained in potentially long input sequences to a fixed-length internal representation by considering all parts of the input sequence impartially. Compression of information can make it difficult for RNNs to learn from long input sequences

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967575194892

 #knowledge-base-construction #machine-learning #unfinished Compression of information can make it difficult for RNNs to learn from long input sequences

Parent (intermediate) annotation

Open it
1 ]. Max pooling compresses the informa- tion contained in potentially long input sequences to a fixed-length internal representation by considering all parts of the input sequence impartially. <span>Compression of information can make it difficult for RNNs to learn from long input sequences <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967577554188

 A workflow engine is a software application that manages business processes.

Parent (intermediate) annotation

Open it
A workflow engine is a software application that manages business processes. Workflow engines typically make use of a database server .

Original toplevel document

Workflow engine - Wikipedia

Annotation 4967579127052

 Workflow engines typically make use of a database server.

Parent (intermediate) annotation

Open it
A workflow engine is a software application that manages business processes. Workflow engines typically make use of a database server .

Original toplevel document

Workflow engine - Wikipedia

Annotation 4967580699916

 #knowledge-base-construction #machine-learning #unfinished Fonduer: we introduce a multimodal LSTM network that combines textual context with universal features that correspond to structural and visual properties of the input documents.

Parent (intermediate) annotation

Open it
ng deep-learning models [ 46 ] tailored for text information extraction (such as long short-term mem- ory (LSTM) networks [ 18 ]) struggle to capture the multimodality of richly formatted data. <span>We introduce a multimodal LSTM network that combines textual context with universal features that correspond to structural and visual properties of the input documents. These features are inherently captured by Fonduer ’s data model and are generated automatically (see Section 4.2). We also introduce a series of data layout optimizations to ensure the

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967582272780

 #knowledge-base-construction #machine-learning #unfinished Fonduer: structural and visual features are generated automatically (see Section 4.2).

Parent (intermediate) annotation

Open it
networks [ 18 ]) struggle to capture the multimodality of richly formatted data. We introduce a multimodal LSTM network that combines textual context with universal features that correspond to <span>structural and visual properties of the input documents. These features are inherently captured by Fonduer ’s data model and are generated automatically (see Section 4.2). We also introduce a series of data layout optimizations to ensure the scalability of Fonduer to millions of document-wide candidates (see Appendix C). <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967583845644

 #knowledge-base-construction #machine-learning #unfinished We also introduce a series of data layout optimizations to ensure the scalability of Fonduer to millions of document-wide candidates (see Appendix C).

Parent (intermediate) annotation

Open it
es that correspond to structural and visual properties of the input documents. These features are inherently captured by Fonduer ’s data model and are generated automatically (see Section 4.2). <span>We also introduce a series of data layout optimizations to ensure the scalability of Fonduer to millions of document-wide candidates (see Appendix C). <span>

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967585680652

Tags
#machine-learning #software-engineering #unfinished
Question
Because of [...] of machine-learning code, live monitoring of system behavior in real time is critical.
the system-level complexity

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, live monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967587253516

Tags
#machine-learning #software-engineering #unfinished
Question
Because of the system-level complexity of [...], live monitoring of system behavior in real time is critical.
machine-learning code

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, live monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967591447820

 #machine-learning #software-engineering #unfinished Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output.

Parent (intermediate) annotation

Open it
Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output. Managing jungle-like data-preparation pipelines, detecting errors and recovering from failures are all difficult and costly [1]. Testing jungle-like data prepartion pipelines often requ

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967593020684

 #machine-learning #software-engineering #unfinished Managing jungle-like data-preparation pipelines, detecting errors and recovering from failures are all difficult and costly [1].

Parent (intermediate) annotation

Open it
Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output. Managing jungle-like data-preparation pipelines, detecting errors and recovering from failures are all difficult and costly [1]. Testing jungle-like data prepartion pipelines often requires expensive end-to-end integration tests. If testing, detecting errors and recovering from failures are difficult and costly,

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967594593548

 #machine-learning #software-engineering #unfinished Testing jungle-like data prepartion pipelines often requires expensive end-to-end integration tests.

Parent (intermediate) annotation

Open it
joins, and sampling steps, often with intermediate files output. Managing jungle-like data-preparation pipelines, detecting errors and recovering from failures are all difficult and costly [1]. <span>Testing jungle-like data prepartion pipelines often requires expensive end-to-end integration tests. If testing, detecting errors and recovering from failures are difficult and costly, they add to technical debt of a system and make further innovation more costly. <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967596166412

 #machine-learning #software-engineering #unfinished If testing, detecting errors and recovering from failures are difficult and costly, this adds to the technical debt of a system and make further innovation more costly.

Parent (intermediate) annotation

Open it
n pipelines, detecting errors and recovering from failures are all difficult and costly [1]. Testing jungle-like data prepartion pipelines often requires expensive end-to-end integration tests. <span>If testing, detecting errors and recovering from failures are difficult and costly, they add to technical debt of a system and make further innovation more costly. <span>

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967599574284

Tags
#knowledge-base-construction #machine-learning #unfinished
Question
Fonduer [...] of the converted PDFs with their original files.
aligns the word sequences

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Fonduer aligns the word sequences of the converted PDFs with their original files by checking if both their characters and number of repeated occurrences before the current word are the same.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967601147148

Tags
#knowledge-base-construction #machine-learning #unfinished
Question
Fonduer aligns the word sequences of the [...] by checking if both their characters and number of repeated occurrences before the current word are the same.
converted PDFs with their original files

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Fonduer aligns the word sequences of the converted PDFs with their original files by checking if both their characters and number of repeated occurrences before the current word are the same.

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967605079308

 In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process.

Parent (intermediate) annotation

Open it
In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples of artifacts would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts. In software, like an archaeological site, any thing that is crea

Original toplevel document

terminology - What does artifact mean? - Software Engineering Stack Exchange
can call anything produced or created while programming or upon execution, an artifact. – TheLegendaryCopyCoder Jul 21 '17 at 9:35 add a comment | 7 Answers 7 active oldest votes 66 [emptylink] <span>In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts, ... like an archaeological site, any thing that is created could be an artifact. In most software development cycles, there's usually a list of specific required artifacts that someone must produce and put on a shared drive or document repository for other people to

Annotation 4967606652172

 Examples of artifacts would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts.

Parent (intermediate) annotation

Open it
In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples of artifacts would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts. In software, like an archaeological site, any thing that is created could be an artifact.

Original toplevel document

terminology - What does artifact mean? - Software Engineering Stack Exchange
can call anything produced or created while programming or upon execution, an artifact. – TheLegendaryCopyCoder Jul 21 '17 at 9:35 add a comment | 7 Answers 7 active oldest votes 66 [emptylink] <span>In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts, ... like an archaeological site, any thing that is created could be an artifact. In most software development cycles, there's usually a list of specific required artifacts that someone must produce and put on a shared drive or document repository for other people to

Annotation 4967608225036

 In software, like an archaeological site, any thing that is created could be an artifact.

Parent (intermediate) annotation

Open it
refers to "things" that are produced by people involved in the process. Examples of artifacts would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts. <span>In software, like an archaeological site, any thing that is created could be an artifact. <span>

Original toplevel document

terminology - What does artifact mean? - Software Engineering Stack Exchange
can call anything produced or created while programming or upon execution, an artifact. – TheLegendaryCopyCoder Jul 21 '17 at 9:35 add a comment | 7 Answers 7 active oldest votes 66 [emptylink] <span>In software development life cycle (SDLC), artifact usually refers to "things" that are produced by people involved in the process. Examples would be design documents, data models, workflow diagrams, test matrices and plans, setup scripts, ... like an archaeological site, any thing that is created could be an artifact. In most software development cycles, there's usually a list of specific required artifacts that someone must produce and put on a shared drive or document repository for other people to

Flashcard 4967613467916

Tags
#knowledge-base-construction #machine-learning
Question
Apache Spark allows Snorkel pro- cesses to [...], thus reducing the time for learning
be distributed to many nodes

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Apache Spark allows Snorkel pro- cesses to be distributed to many nodes, thus reducing the time for learning

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967615040780

Tags
#knowledge-base-construction #machine-learning
Question
Apache Spark allows Snorkel pro- cesses to be distributed to many nodes, thus [...] learning
reducing the time for

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Apache Spark allows Snorkel pro- cesses to be distributed to many nodes, thus reducing the time for learning

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967616613644

Tags
#knowledge-base-construction #machine-learning
Question
[...] allows Snorkel pro- cesses to be distributed to many nodes, thus reducing the time for learning
Apache Spark

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Apache Spark allows Snorkel pro- cesses to be distributed to many nodes, thus reducing the time for learning

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967619497228

 A few business rules can make developing corporate CRUD apps start feeling like a craft.

Parent (intermediate) annotation

Open it
A few business rules can make developing corporate CRUD apps start feeling like a craft. Sometimes business rules are only in the minds of senior business users and not documented formally.

Original toplevel document

AGILE must be destroyed, once and for all - Erik Meijer : programming
at 1) there's often no connection between the "product owner" and the user community, so adoption fails, and 2) people don't make rational decisions. level 2 _georgesim_ 12 points · 4 years ago <span>Throw a few business rules in there and then it starts feeling like a craft. Bonus points if the business rules are only in the minds of senior business users and not documented formally. Continue this thread level 2 JBlitzen 1 point · 4 years ago Anyone who's ever used business software knows that the difference between great business software and shitty business softwa

Annotation 4967621070092

 Sometimes business rules are only in the minds of senior business users and not documented formally.

Parent (intermediate) annotation

Open it
A few business rules can make developing corporate CRUD apps start feeling like a craft. Sometimes business rules are only in the minds of senior business users and not documented formally.

Original toplevel document

AGILE must be destroyed, once and for all - Erik Meijer : programming
at 1) there's often no connection between the "product owner" and the user community, so adoption fails, and 2) people don't make rational decisions. level 2 _georgesim_ 12 points · 4 years ago <span>Throw a few business rules in there and then it starts feeling like a craft. Bonus points if the business rules are only in the minds of senior business users and not documented formally. Continue this thread level 2 JBlitzen 1 point · 4 years ago Anyone who's ever used business software knows that the difference between great business software and shitty business softwa

Annotation 4967623429388

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie D'après la loi du 11 février 2005, pour l'égalité des droits et des chances, la participation et la citoyenneté des personnes handicapées, le terme de handicap est défini ainsi : « Constitue un handicap toute limitation d'activité ou restriction de participation à la vie en société subie dans son environnement par une personne en raison d'une altération substantielle, durable ou définitive d'une ou plusieurs fonctions physiques, sensorielles, mentales, cognitives ou psychiques, d'un polyhandicap ou d'un trouble de santé invalidant »

pdf

cannot see any pdfs

Annotation 4967624477964

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Le taux de prévalence des handicaps de l'enfant n'a pas diminué durant les dernières décades. En France comme à l'étranger, la proportion d'enfants déficients est proche de 2,5 % tous handicaps confondus.

pdf

cannot see any pdfs

Annotation 4967625526540

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Enfin, les troubles du neurodéveloppement, toutes causes confondues, représentent 45 % des maladies chroniques de l'enfant (source CNAMTS)

pdf

cannot see any pdfs

Annotation 4967626575116

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie La compensation du handicap de l'enfant est assurée par l'allocation pour l'éducation de l'enfant handicapé (AEEH) et la prestation de compensation du handicap (PCH), mais également par l'offre de services et de places dans les établissements du secteur médicosocial

pdf

cannot see any pdfs

Annotation 4967627623692

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Maison départementale des personnes handicapées (MDPH) : Elle offre un accès unique aux droits et prestations prévus pour les personnes handicapées.Elle informe et accompagne les personnes handicapées et leurs familles dès l'annonce du handicap et tout au long de son évolution.Elle assure l'organisation de la Commission des droits et de l'autonomie des personnes handicapées (CDAPH) et le suivi de la mise en œuvre de ses décisions, ainsi que la gestion du fonds départemental de compensation du handicap

pdf

cannot see any pdfs

Annotation 4967628672268

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Principes d'orientation de l'enfant handicapé Un enfant reconnu handicapé doit pouvoir bénéficier d'une éducation spéciale, gratuite qui associe des actions médicales, paramédicales, sociales, pédagogiques et psychologiques

pdf

cannot see any pdfs

Annotation 4967639158028

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Le droit à la scolarité s'intègre dans le projet personnalisé de scolarisation (PPS) qui s'établit en lien avec : l'équipe éducative (au sein de laquelle le médecin scolaire doit avoir une place essentielle)les parentsun enseignant référent de la MDPHles équipes de soins Les parents sont étroitement associés à l'élaboration de ce projet personnalisé ainsi qu'à la décision d'orientation, prise en accord avec eux par la CDAPH

pdf

cannot see any pdfs

Annotation 4967640206604

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Les AVS sont des personnels non enseignants, mis à disposition des établissements pour intervenir auprès d'un enfant porteur d'un handicap. Cette mesure est décidée sur étude de dossier par la commission des droits à l'autonomie de la MDPH. Enfin, tous les examens et concours organisés par l'Éducation nationale offrent des possibilités d'aménagements étendus et renforcés pour les candidats handicapés (tiers temps supplémentaire, assistant de secrétariat…)

pdf

cannot see any pdfs

Annotation 4967641255180

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie En primaire, les ULIS École (unités localisées pour l'inclusion scolaire) accueillent 12 enfants au maximum

pdf

cannot see any pdfs

Annotation 4967642303756

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Les ULIS Collège assurent une continuité avec les ULIS École et accueillent 10 élèves âgés de 11 à 16 ans

pdf

cannot see any pdfs

Annotation 4967643352332

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Au collège, les SEGPA (sections d'enseignement général et professionnel adapté) accueillent les élèves ayant des difficultés d'apprentissage graves et persistantes. Il s'agit d'un enseignement adapté qui vise une qualification professionnelle. L'élève sera ensuite orienté, après la classe de 3ème , vers : un lycée professionnelun centre d'apprentisun établissement régional d'enseignement adapté (EREA)

pdf

cannot see any pdfs

Annotation 4967644400908

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie CAMSP Pour les enfants âgés de 0 à 6 ans, les CAMSP (centres d'action médicosociale précoce) ont pour objet le dépistage, la cure ambulatoire et la rééducation des enfants ayant des déficits sensoriels, intellectuels ou moteurs, en vue d'une adaptation sociale et éducative dans leur milieu naturel et avec la participation de leurs familles

pdf

cannot see any pdfs

Annotation 4967645449484

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie CAMSP Ce type de prise en charge ne nécessite pas d'orientation par la MDPH ; l'accès y est direct à la demande de la famille ou de médecins

pdf

cannot see any pdfs

Annotation 4967646498060

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Autres services ne nécessitant pas d'orientation MDPH D'autres services peuvent également intervenir : Pour les enfants âgés de 3 à 18 ans ayant des troubles psychoaffectifs, psychomoteurs ou des troubles des apprentissages : les CMPP (centres médicopsychopédagogiques)Pour les enfants ayant des troubles psychiques : les CMP (centres médicopsychologiques) ayant un rôle essentiellement thérapeutique.

pdf

cannot see any pdfs

Annotation 4967647546636

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Les frais de rééducations par des psychologues, psychomotriciens et ergothérapeutes en libéral ne sont pas pris en charge par la Sécurité sociale et ne sont compensables que par les allocations versées par la MDPH (AEEH)

pdf

cannot see any pdfs

Annotation 4967648595212

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Services médicosociaux d'accompagnement nécessitant une orientation MDPH Pour les enfants âgés de 0 à 20 ans, différents services d'accompagnement existent. Ils se répartissent en plusieurs catégories selon le handicap : SESSAD (services d'éducation spéciale et de soins à domicile) pour les enfants atteints de déficiences intellectuelles et motrices, de troubles du caractère et du comportementSSAD (services d'aides et de soins à domicile) pour les enfants présentant un polyhandicap qui associe déficience motrice et déficience mentale sévère et profonde SAFEP (services d'accompagnement familial et d'éducation précoce) pour les enfants âgés de 0 à 3 ans ayant une déficience auditive et visuelleSSFIS (services de soutien à l'éducation familiale et à l'intégration scolaire) pour les enfants déficients auditifs âgés de plus de 3 ans SAAIS (services d'aide à l'acquisition de l'autonomie et à l'intégration scolaire) pour les enfants déficients visuels âgés de plus de 3 ans

pdf

cannot see any pdfs

Annotation 4967649643788

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Si tout enfant handicapé peut être inscrit dans « l'école ou l'établissement du second degré de son quartier », il peut exister des limites à cette intégration. Le pédiatre doit veiller à ce que l'enfant ne paye pas son adaptation scolaire à un prix méconnu : efforts incessants, sentiment de ne jamais en faire assez et devoir en faire toujours plus

pdf

cannot see any pdfs

Annotation 4967667993868

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie Différentes structures proposent une prise en charge au long cours de la totalité ou d'une partie des besoins de l'enfant handicapé tant au niveau éducatif que rééducatif et psychologique. L'accès se fait par l'intermédiaire de la CDAPH de la MDPH. Il s'agit principalement : D'IME (instituts médico-éducatifs) pour les enfants âgés de 0 et 20 ans, en distinguant les établissements pour enfants ayant : une déficience intellectuellede ceux pour enfants ayant une déficience motricede ceux pour enfants polyhandicapésde ceux pour enfants ayant une déficience auditive graveet de ceux pour enfants ayant une déficience visuelle grave ou cécité D'IMPRO (instituts médico-professionnels) après l'âge de 14 ans afin de donner une formation professionnelle D'IR (instituts de rééducation) pour des enfants ayant des difficultés scolaires sévères associées à des troubles du comportement Des IEM (instituts d'éducation motrice) pour les enfants atteints de déficience motrice sévère

pdf

cannot see any pdfs

Annotation 4967669042444

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie La prise en charge sociale repose avant tout sur la rédaction de certificats médicaux qui doivent être précis, clairs, synthétiques et contenir des éléments pertinents (certificat MDPH et ALD [affections longue durée]). Ces certificats sont soumis au secret médical. Les enfants handicapés bénéficient d'une exonération du ticket modérateur, avec prise en charge à 100 % des frais de santé. Il faut être assuré social, ou ayant droit (conjoint, enfants à charge)

pdf

cannot see any pdfs

Annotation 4967670091020

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie L'attribution de l'allocation d'éducation de l'enfant handicapé (AEEH) et de ses compléments repose sur les conditions suivantes : toute personne qui assure la charge d'un enfant handicapé âgé de moins de 20 anssi l'incapacité de l'enfant est au moins égale à 80 % (perte de l'autonomie pour la plupart des actes de la vie quotidienne)ou entre 50 et 80 % s'il est placé en externat ou en semi-internat dans un établissement spécialisé ou pris en charge par un SESSAD. Seule l'AEEH permet l'exonération du forfait hospitalier

pdf

cannot see any pdfs

Annotation 4967671139596

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie La carte d'invalidité est attribuée lorsque le taux d'incapacité est égal ou supérieur à 80 %. Les avantages obtenus sont divers : Macaron GIC,Exonération de la redevance TV,Frais d'aide à domicile,Gratuité des transports pour l'accompagnant. Elle relève de la compétence de la MDPH

pdf

cannot see any pdfs

Annotation 4967672188172

 #54 #Cours #Facultaires #Handicap #Médecine #Pédiatrie L'allocation journalière de présence parentale (AJPP) est attribuée lorsque l'enfant est atteint d'une maladie, d'un handicap, ou victime d'un accident rendant indispensable une présence parentale soutenue et des soins contraignants. Le congé est d'une durée de 310 jours à prendre sur 3 ans en fonction des besoins d'accompagnement de l'enfant. L'obtention du congé n'est pas cumulable avec le complément d'éducation spéciale perçu pour le même enfant. L'AJPP est par contre cumulable avec l'AEEH simple

pdf

cannot see any pdfs

Annotation 4967681101068

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine La survenue d'un handicap ou d'une maladie chronique chez l'enfant est une situation fréquente en France : La paralysie cérébrale est la première cause de handicap moteur. La paralysie cérébrale est liée à une lésion du cerveau survenue dans la période anténatale ou périnatale. Le facteur de risque principal de paralysie cérébrale est la prématurité La trisomie 21 et le syndrome d'alcoolisation fœtale (SAF) sont la première cause de retard mental, respectivement héréditaire et non héréditaire La dyslexie-dysorthographie est la première cause des troubles spécifiques des apprentissages Les troubles envahissants du développement sont la première cause de handicap d'origine psychiatrique

pdf

cannot see any pdfs

Annotation 4967682149644

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine L'enfant handicapé est un enfant n'ayant ni les activités ni la participation attendues, selon la classification internationale du fonctionnement, du handicap et de la santé (CIF), pour son groupe d'âge dans la société dans laquelle il vit

pdf

cannot see any pdfs

Annotation 4967683198220

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Le sur-handicap est l'ajout de déficiences secondaires ou de troubles du comportement à un handicap préexistant. En effet, le handicap de départ peut provoquer des difficultés relationnelles ou des difficultés d'apprentissage et conduire ainsi à une aggravation du handicap

pdf

cannot see any pdfs

Annotation 4967684246796

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Le polyhandicap est un handicap sévère associant l'existence de déficiences graves et durables à un retard mental grave ou profond

pdf

cannot see any pdfs

Annotation 4967685295372

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Toutefois, une prise en charge rééducative peut démarrer avant de connaître précisément le diagnostic étiologique d'une affection chronique. De même un diagnostic peut être reconsidéré devant une évolution clinique inhabituelle. Par exemple, l'aggravation clinique d'un tableau de diplégie spastique en contexte de paralysie cérébrale doit faire rechercher une pathologie évolutive dégénérative autre

pdf

cannot see any pdfs

Annotation 4967686343948

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Une des échelles génériques communément utilisée pour mesurer l'autonomie est la MIF-môme (mesure d'indépendance fonctionnelle pour les enfants de 0 à 8 ans : activités de base, déplacements et manipulations, langage et cognition)

pdf

cannot see any pdfs

Annotation 4967703645452

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine La Haute Autorité de santé (HAS) en 2012 a émis les recommandations suivantes pour l'amélioration du passage de l'enfant à l'adulte : Domaine du soin : transformer progressivement le jeune en interlocuteur principal ; éducation à la santé et à ses besoins médicaux spécifiquesCadre administratif : bilan social et accès au guide des démarches administratives avant la majoritéParticipation sociale : relais entre projets scolaires et professionnels, choix du lieu de vie, associations de personnes handicapéesCadre médico-social : interlocuteur référent et programme de transition écrit

pdf

cannot see any pdfs

Annotation 4967704694028

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Moteur : Scolarité ordinaire ± rééducation en libéral OU± CAMPS puis SESSD IEM OUIME si scolarisation ordinaire impossible Cognitif Scolarité ordinaire ± aide humaine et matérielle CLIS puis ULIS ou SEGPA si classe ordinaire impossibleIME si scolarisation ordinaire impossible Psychiatrique Scolarité ordinaire ± CMP OUCMPP IME si scolarisation ordinaire impossible Sensoriel Scolarité ordinaire ± aide humaine et matérielle IES ou certains EREA Polyhandicap IME ou IEM

pdf

cannot see any pdfs

Annotation 4967705742604

 #54 #Cours #Enfant #Facultaires #Handicap #MPR #Médecine Les aides financières pour la compensation du handicap sont l'allocation d'éducation de l'enfant handicapé (AEEH) et la prestation de compensation du handicap (PCH). Les parents peuvent augmenter leur temps de présence parentale en demandant un congé de présence parentale ou une allocation journalière de présence parentale (AJPP). L'exonération du ticket modérateur et la carte d'invalidité sont deux demandes à remplir par le médecin

pdf

cannot see any pdfs

Flashcard 4967732481292

Tags
#MLBook #machine-learning #sample-mean
Question
It can be shown that an unbiased estimator of an unknown $$\mathbb E \left[ X \right]$$] (given by either eq. 1 or eq. 2) is given by [...].
$$\frac{1}{N} \sum_{i=1}^N x_i$$ (called in statistics the sample mean)

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
It can be shown that an unbiased estimator of an unknown $$\mathbb E \left[ X \right]$$] (given by either eq. 1 or eq. 2) is given by $$\frac{1}{N} \sum_{i=1}^N x_i$$ (called in statistics the sample mean).

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967736675596

Tags
#MLBook #hard-margin-SVM #has-images #hinge-loss #machine-learning #noise #soft-margin-SVM #support-vector-machine
[unknown IMAGE 4773337763084]
Question
Describe how to deal with noise in Support Vector Machine.

To extend SVM to cases in which the data is not linearly separable, we introduce the hinge loss function: $$\max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$.

The hinge loss function is zero if the constraints in 8 [i.e., $$\mathbf w \mathbf x_i − b \ge +1 \; \textrm{if} \; y_i = +1$$ and $$\mathbf w \mathbf x_i − b \le -1 \; \textrm{if} \; y_i = -1$$] are satisfied; in other words, if $$\mathbf w \mathbf x_i$$ lies on the correct side of the decision boundary. For data on the wrong side of the decision boundary, the function’s value is proportional to the distance from the decision boundary.

We then wish to minimize the following cost function,

$$C \left\Vert \mathbf w \right\Vert^2 + \frac{1}{N} \displaystyle \sum_{i=1}^N \max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$,

where the hyperparameter $$C$$ determines the tradeoff between increasing the size of the decision boundary and ensuring that each $$\mathbf x_i$$ lies on the correct side of the decision boundary. The value of $$C$$ is usually chosen experimentally, just like ID3’s hyperparameters $$\epsilon$$ and $$d$$ . SVMs that optimize hinge loss are called soft-margin SVMs, while the original formulation is referred to as a hard-margin SVM.

As you can see, for sufficiently high values of $$C$$, the second term in the cost function will become negligible, so the SVM algorithm will try to find the highest margin by completely ignoring misclassification. As we decrease the value of $$C$$, making classification errors is becoming more costly, so the SVM algorithm tries to make fewer mistakes by sacrificing the margin size. As we have already discussed, a larger margin is better for generalization. Therefore, $$C$$ regulates the tradeoff between classifying the training data well (minimizing empirical risk) and classifying future examples well (generalization).

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
To extend SVM to cases in which the data is not linearly separable, we introduce the hinge loss function: $$\max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$. The hinge loss function is zero if the constraints in 8 [i.e., $$\mathbf w \mathbf x_i − b \ge +1 \; \textrm{if} \; y_i = +1$$ and $$\mathbf w \mathbf x_i − b \le +1 \; \textrm{if} \; y_i = -1$$] are satisfied; in other words, if $$\mathbf w \mathbf x_i$$ lies on the correct side of the decision boundary. For data on the wrong side of the decision boundary, the function’s value is proportional to the distance from the decision boundary. We then wish to minimize the following cost function, $$C \left\Vert \mathbf w \right\Vert^2 + \frac{1}{N} \displaystyle \sum_{i=1}^N \max (0, 1 − y_i (\mathbf w \mathbf x_i − b))$$, where the hyperparameter $$C$$ determines the tradeoff between increasing the size of the decision boundary and ensuring that each $$\mathbf x_i$$ lies on the correct side of the decision boundary. The value of $$C$$ is usually chosen experimentally, just like ID3’s hyperparameters $$\epsilon$$ and $$d$$ . SVMs that optimize hinge loss are called soft-margin SVMs, while the original formulation is referred to as a hard-margin SVM. As you can see, for sufficiently high values of $$C$$, the second term in the cost function will become negligible, so the SVM algorithm will try to find the highest margin by completely ignoring misclassification. As we decrease the value of $$C$$, making classification errors is becoming more costly, so the SVM algorithm tries to make fewer mistakes by sacrificing the margin size. As we have already discussed, a larger margin is better for generalization. Therefore, $$C$$ regulates the tradeoff between classifying the training data well (minimizing empirical risk) and classifying future examples well (generalization).

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967740083468

Tags
#MLBook #cosine-similarity #k-nearest-neighbors #kNN #machine-learning
Question
Describe the k-Nearest Neighbors (kNN) learning algorithm.

k-Nearest Neighbors (kNN) is a non-parametric learning algorithm. Contrary to other learning algorithms that allow discarding the training data after the model is built, kNN keeps all training examples in memory. Once a new, previously unseen example $$\mathbf x$$ comes in, the kNN algorithm finds $$k$$ training examples closest to $$\mathbf x$$ and returns the majority label, in case of classification, or the average label, in case of regression.

The closeness of two examples is given by a distance function. For example, Euclidean distance seen above is frequently used in practice. Another popular choice of the distance function is the negative cosine similarity. Cosine similarity defined as,

$$s \left( \mathbf x_i, \mathbf x_k \right) \stackrel{\textrm{def}}{=} \cos \left( \angle \left( \mathbf x_i, \mathbf x_k \right) \right) = \frac{\sum_{j = 1}^D x_i^{(j)} x_k^{(j)}}{\sqrt{\sum_{j=1}^D \left( x_i^{(j)}\right)^2} \sqrt{\sum_{j=1}^D \left( x_k^{(j)}\right)^2}}$$,

is a measure of similarity of the directions of two vectors. If the angle between two vectors is 0 degrees, then two vectors point to the same direction, and cosine similarity is equal to 1. If the vectors are orthogonal, the cosine similarity is 0. For vectors pointing in opposite directions, the cosine similarity is − 1. If we want to use cosine similarity as a distance metric, we need to multiply it by −1. Other popular distance metrics include Chebychev distance, Mahalanobis distance, and Hamming distance. The choice of the distance metric, as well as the value for $$k$$, are the choices the analyst makes before running the algorithm. So these are hyperparameters. The distance metric could also be learned from data (as opposed to guessing it). We talk about that in Chapter 10.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
k-Nearest Neighbors (kNN) is a non-parametric learning algorithm. Contrary to other learning algorithms that allow discarding the training data after the model is built, kNN keeps all training examples in memory. Once a new, previously unseen example $$\mathbf x$$ comes in, the kNN algorithm finds $$k$$ training examples closest to $$\mathbf x$$ and returns the majority label, in case of classification, or the average label, in case of regression. The closeness of two examples is given by a distance function. For example, Euclidean distance seen above is frequently used in practice. Another popular choice of the distance function is the negative cosine similarity. Cosine similarity defined as, $$s \left( \mathbf x_i, \mathbf x_k \right) \stackrel{\textrm{def}}{=} \cos \left( \angle \left( \mathbf x_i, \mathbf x_k \right) \right) = \frac{\sum_{j = 1}^D x_i^{(j)} x_k^{(j)}}{\sqrt{\sum_{j=1}^D \left( x_i^{(j)}\right)^2} \sqrt{\sum_{j=1}^D \left( x_k^{(j)}\right)^2}}$$, is a measure of similarity of the directions of two vectors. If the angle between two vectors is 0 degrees, then two vectors point to the same direction, and cosine similarity is equal to 1. If the vectors are orthogonal, the cosine similarity is 0. For vectors pointing in opposite directions, the cosine similarity is − 1. If we want to use cosine similarity as a distance metric, we need to multiply it by −1. Other popular distance metrics include Chebychev distance, Mahalanobis distance, and Hamming distance. The choice of the distance metric, as well as the value for $$k$$, are the choices the analyst makes before running the algorithm. So these are hyperparameters. The distance metric could also be learned from data (as opposed to guessing it). We talk about that in Chapter 10.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967742967052

Tags
#MLBook
Question
Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is [...]
finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967750044940

Tags
#MLBook #SVM #has-images #machine-learning #non-linearity
[unknown IMAGE 4773337763084]
Question
Describe how SVM can be adapted to work with datasets that cannot be separated by a hyperplane in its original space, like the one shown in Figure 5 right.
[unknown IMAGE 4773373938956]

SVM can be adapted to work with datasets that cannot be separated by a hyperplane in its original space. Indeed, if we manage to transform the original space into a space of higher dimensionality, we could hope that the examples will become linearly separable in this transformed space. In SVMs, using a function to implicitly transform the original space into a higher dimensional space during the cost function optimization is called the kernel trick.

The effect of applying the kernel trick is illustrated in Figure 6. As you can see, it’s possible to transform a two-dimensional non-linearly-separable data into a linearly-separable three-dimensional data using a specific mapping $$\phi: \mathbf x \mapsto \phi (\mathbf x)$$, where $$\phi (\mathbf x)$$ is a vector of higher dimensionality than $$\mathbf x$$. For the example of 2D data in Figure 5 (right), the mapping $$\phi$$ for that projects a 2D example $$\mathbf x = \left[ q, p \right]$$ into a 3D space (Figure 6) would look like this: $$\phi \left( \left[ q, p \right] \right) \stackrel{\textrm{def}}{=} \left( q^2, \sqrt{2} qp, p^2\right)$$, where $$\cdot^2$$ means $$\cdot$$ squared. You see now that the data becomes linearly separable in the transformed space.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
SVM can be adapted to work with datasets that cannot be separated by a hyperplane in its original space. Indeed, if we manage to transform the original space into a space of higher dimensionality, we could hope that the examples will become linearly separable in this transformed space. In SVMs, using a function to implicitly transform the original space into a higher dimensional space during the cost function optimization is called the kernel trick. The effect of applying the kernel trick is illustrated in Figure 6. As you can see, it’s possible to transform a two-dimensional non-linearly-separable data into a linearly-separable three-dimensional data using a specific mapping $$\phi: \mathbf x \mapsto \phi (\mathbf x)$$, where $$\phi (\mathbf x)$$ is a vector of higher dimensionality than $$\mathbf x$$. For the example of 2D data in Figure 5 (right), the mapping $$\phi$$ for that projects a 2D example $$\mathbf x = \left[ q, p \right]$$ into a 3D space (Figure 6) would look like this: $$\phi \left( \left[ q, p \right] \right) \stackrel{\textrm{def}}{=} \left( q^2, \sqrt{2} qp, p^2\right)$$, where $$\cdot^2$$ means $$\cdot$$ squared. You see now that the data becomes linearly separable in the transformed space.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967757647116

Tags
#MLBook #cores #decision-trees #learning-algorithm-selection #linear-regression #logistic-regression #machine-learning #neural-networks #random-forests #training-speed
Question
Discuss about training questions regarding a machine learning algorithm.
How much time is a learning algorithm allowed to use to build a model? Neural networks are known to be slow to train. Simple algorithms like logistic and linear regression or decision trees are much faster. Specialized libraries contain very efficient implementations of some algorithms; you may prefer to do research online to find such libraries. Some algorithms, such as random forests, benefit from the availability of multiple CPU cores, so their model building time can be significantly reduced on a machine with dozens of cores.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Training speed How much time is a learning algorithm allowed to use to build a model? Neural networks are known to be slow to train. Simple algorithms like logistic and linear regression or decision trees are much faster. Specialized libraries contain very efficient implementations of some algorithms; you may prefer to do research online to find such libraries. Some algorithms, such as random forests, benefit from the availability of multiple CPU cores, so their model building time can be significantly reduced on a machine with dozens of cores.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967763676428

Tags
#MLBook #SVM #deep-neural-networks #ensemble-algorithms #learning-algorithm-selection #linear-regression #logistic-regression #machine-learning #non-linearity
Question
• Nonlinearity of the data

Is your data linearly separable or can it be modeled using a linear model? If yes, [...] can be good choices. Otherwise, deep neural networks or ensemble algorithms, discussed in Chapters 6 and 7, might work better.

SVM with the linear kernel, logistic or linear regression

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Nonlinearity of the data Is your data linearly separable or can it be modeled using a linear model? If yes, SVM with the linear kernel, logistic or linear regression can be good choices. Otherwise, deep neural networks or ensemble algorithms, discussed in Chapters 6 and 7, might work better.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967765249292

Tags
#MLBook #SVM #deep-neural-networks #ensemble-algorithms #learning-algorithm-selection #linear-regression #logistic-regression #machine-learning #non-linearity
Question
• Nonlinearity of the data

Is your data linearly separable or can it be modeled using a linear model? If yes, SVM with the linear kernel, logistic or linear regression can be good choices. Otherwise, [...] or ensemble algorithms, discussed in Chapters 6 and 7, might work better.

deep neural networks

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
inearity of the data Is your data linearly separable or can it be modeled using a linear model? If yes, SVM with the linear kernel, logistic or linear regression can be good choices. Otherwise, <span>deep neural networks or ensemble algorithms, discussed in Chapters 6 and 7, might work better. <span>

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4967767084300

Tags
#MLBook #continuous-random-variable #expectation #machine-learning
Question

The expectation of a continuous random variable $$X$$ is given by,

[...] .

$$\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,$$

where $$f_X$$ is the pdf of the variable $$X$$ and $$\int_{\mathbb R}$$ is the integral of function $$x f_X$$

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The expectation of a continuous random variable $$X$$ is given by, $$\mathbb E \left[ X \right] \stackrel{\textrm{def}}{=} \int_{\mathbb R} x f_X \left( x \right) dx,$$ where $$f_X$$ is the pdf of the variable $$X$$ and $$\int_{\mathbb R}$$ is the integral of function $$x f_X$$ .

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4967768657164

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Le système tympano-ossiculaire a pour fonction principale l'adaptation d'impédance des ondes transmises en milieu aérien vers le milieu liquidien de l'oreille interne. En son absence, la perte auditive est d'environ 50 à 55 dB

pdf

cannot see any pdfs

Annotation 4967769705740

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La pars tensa, semi-transparente, présente un relief principal : le manche du marteau. La pars flaccida est au-dessus de la pars tensa, séparée par les ligaments tympanomalléaires antérieurs et postérieurs

pdf

cannot see any pdfs

Annotation 4967770754316

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL L'oreille externe est constituée par le pavillon et le conduit auditif externe (CAE). Ses fonctions principales sont : La protection mécanique du système tympano-ossiculaire par l'angulation anatomique conduit cartilagineux-conduit osseuxL'amplification des fréquences conversationnelles (surtout entre 2 et 4 kHz) liée à la résonance dans le CAELa localisation sonore (surtout verticale, liée aux reliefs du pavillon)

pdf

cannot see any pdfs

Annotation 4967771802892

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La cochlée assure la transduction, c'est-à-dire la transformation d'une énergie mécanique (l'onde sonore propagée dans les liquides de l'oreille interne de la base vers l'apex de la cochlée) en une énergie électrique transmise par le nerf cochléaire

pdf

cannot see any pdfs

Annotation 4967772851468

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La cochlée est organisée de façon tonotopique (hautes fréquences vers la base de la cochlée et basses fréquences vers l'apex)

pdf

cannot see any pdfs

Annotation 4967773900044

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Pour améliorer la sélectivité fréquentielle, la cochlée utilise aussi des phénomènes actifs : les cellules ciliées externes (CCE) ont une capacité de motilité intrinsèque (à la base des techniques d'otoémissions) qui accentue très localement la vibration et donc la transduction des CCI

pdf

cannot see any pdfs

Annotation 4967775735052

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Une pathologie de l'oreille externe et/ou moyenne, si elle est responsable d'une surdité, donnera une surdité de transmission : les niveaux auditifs sont alors meilleurs en conduction osseuse qu'en conduction aérienne (à la base des épreuves acoumétriques de Rinne et de Weber). En cas d'atteinte de l'oreille interne ou du nerf cochléaire, on aura une surdité de perception (ou surdité neurosensorielle) : les niveaux auditifs en conduction osseuse et en conduction aérienne seront les mêmes, il s'agit d'une surdité de perception pure.

pdf

cannot see any pdfs

Annotation 4967776783628

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Perte entre 0 et 20 dB : audition normale ou subnormale Perte entre 20 et 40 dB : perte légère La parole est comprise à un niveau normal mais difficultés pour la voix faible Perte entre 40 et 70 dB : perte moyenne La parole est perçue si elle est forte Perte entre 70 et 90 dB : perte sévère La parole n'est perçue qu'à des niveaux très fortsLa lecture labiale est un complément nécessaire Perte supérieure à 90 dB : perte profonde Compréhension de la parole presque impossibleTroubles importants d'acquisition du langage pour le jeune enfant

pdf

cannot see any pdfs

Annotation 4967799328012

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Épreuve de Weber L'épreuve de Weber consiste à poser un diapason en vibration sur le crâne à équidistance des deux oreilles (front ou vertex) : Si le patient entend le son dans les deux oreilles ou de manière diffuse, le Weber est dit indifférentSi le patient entend le son dans une oreille, on parle de Weber latéralisé vers l'oreille où le son est perçu : Le Weber est latéralisé vers l'oreille sourde en cas de surdité de transmissionLe Weber est latéralisé vers l'oreille saine en cas de surdité de perception

pdf

cannot see any pdfs

Annotation 4967800376588

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Épreuve de Rinne L'épreuve de Rinne consiste à comparer l'intensité du son perçu par le patient d'un diapason en vibration devant le pavillon (conduction aérienne, CA) et posé sur la mastoïde (conduction osseuse, CO) : Rinne = CA – CO.On commence par appliquer le diapason sur la mastoïde puis, quand le patient ne perçoit plus le son, on place le diapason devant le pavillon : En l'absence de pathologie de la transmission, le patient doit continuer à percevoir le son plus longtemps par voie aérienne que par voie osseuse, on parle de Rinne acoumétrique positif (CA – CO > 0)Si le patient ne perçoit plus le son, on parle de Rinne acoumétrique négatif (CA – CO < 0). En cas d'audition normale ou de surdité de perception, le Rinne sera positif.En cas de surdité de transmission, le Rinne sera négatif. On teste les deux oreilles séparément

pdf

cannot see any pdfs

Annotation 4967802211596

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Audiométrie tonale Son principe repose sur une stimulation sonore par des sons purs de fréquence (Hz) et d'intensités variées (dB) avec détermination du seuil subjectif liminaire d'audition par voie aérienne (casque) et voie osseuse (vibrateur mastoïdien) : Si l'audition est normale ou s'il existe une surdité de perception, les courbes en conduction osseuse et aérienne sont superposées. Le Rinne est dit positif par analogie avec l'acoumétrie En cas de surdité de transmission, la conduction osseuse est meilleure que la conduction aérienne : le Rinne est négatif.

pdf

cannot see any pdfs

Annotation 4967803260172

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Audiométrie vocale Elle utilise la stimulation sonore par des sons complexes le plus souvent signifiants (mots monosyllabiques ou bisyllabiques, phrases), quelquefois non signifiants (logatomes : voyelle- consonne-voyelle).L'utilisation de listes de mots bisyllabiques est la plus utilisée en pratique clinique

pdf

cannot see any pdfs

Annotation 4967804308748

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL L'impédancemétrie est la mesure de l'impédance de l'oreille moyenne et de ses modifications sous l'influence d'une surpression ou d'une dépression créée dans le conduit auditif externe. Elle ne peut être réalisée qu'en absence de perforation tympanique.Elle fournit de façon objective des renseignements sur la valeur fonctionnelle de la trompe d'Eustache et du système tympano-ossiculaire

pdf

cannot see any pdfs

Annotation 4967805357324

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Courbes d'impédancemétrie tympanique Trouble de la ventilation de l'oreille moyenne Courbe de type C le pic de compliance est décalé vers les pressions négatives, il existe donc une dépression dans la caisse du tympan Présence d'un épanchement liquidien dans la caisse du tympan Courbe de type B Caractères physiques du système tympano-ossiculaire : Atteinte ossiculaire Courbe en « Tour Eiffel » : pic ample et pointu par rupture de la chaîne ossiculaire

pdf

cannot see any pdfs

Annotation 4967807192332

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Réflexe stapédien Il s'agit du recueil de la contraction du muscle stapédien, lors d'une stimulation auditive supra-liminaire (> 80 dB), par la mesure de la variation d'impédance du système tympano-ossiculaire (par impédancemétrie).Il est à noter que la variation de l'impédance par contraction du muscle de l'étrier ne peut se manifester dans certaines affections (otospongiose)

pdf

cannot see any pdfs

Annotation 4967808240908

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Potentiels évoqués auditifs précoces (PEA), dits du tronc cérébral Le principe des PEA est d'enregistrer par des électrodes de surface des potentiels électriques qui prennent naissance à différents niveaux du système nerveux en réponse à une stimulation acoustique.

pdf

cannot see any pdfs

Annotation 4967809289484

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL PEA : C'est un examen non invasif (prélèvement du signal par électrodes cutanées), dont l'intérêt est double : Otologique : mesure objective du seuil auditif avec une précision de 10–15 dB dès la naissance C'est un moyen d'audiométrie objective de l'enfant (ou du sujet non coopérant) Otoneurologique : localisation topographique de l'atteinte auditive dans les surdités neuro-sensorielles par étude des latences et des délais de conduction des cinq pics : I (cochlée)II (nerf auditif)III, IV, V (tronc cérébral)

pdf

cannot see any pdfs

Annotation 4967810338060

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL PEA : Ses limites sont les suivantes : Il ne permet pas une étude fréquence par fréquence des réponsesIl explore une plage de fréquences aiguës de l'audiométrie (et donc pas les fréquences graves)La profondeur de la surdité peut gêner l'interprétation des courbes pour l'analyse des latences

pdf

cannot see any pdfs

Annotation 4967811386636

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les cellules ciliées internes (CCI) sont les seuls récepteurs sensoriels de l'audition, alors que les cellules ciliées externes (CCE) possèdent des propriétés micromécaniques : elles agissent de façon mécanique sur la membrane basilaire.

pdf

cannot see any pdfs

Annotation 4967812435212

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Le recueil des OEAP est une méthode simple, rapide (une minute) et fiable d'exploration du fonctionnement des CCE, dont on sait qu'elles sont les premières à disparaître en cas d'atteinte cochléaire

pdf

cannot see any pdfs

Annotation 4967813483788

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL OEA : Chez l'adulte, elle permet la détection d'atteinte cochléaire infraclinique (traitement oto- toxique, surveillance des surdités professionnelles, traumatismes sonores…).La présence d'otoémissions ne permet pas d'éliminer une surdité par neuropathie auditive, ni d'affirmer que l'enfant ne présentera pas une surdité ultérieure

pdf

cannot see any pdfs

Annotation 4967842057484

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Clinique Les surdités de transmission peuvent avoir les caractéristiques suivantes : Elles peuvent être uni- ou bilatérales Elles sont d'intensité légère ou moyenne : le maximum de la perte audiométrique est de 60 dB Elles n'entraînent pas de modification qualitative de la voixL'intelligibilité est souvent améliorée dans le bruit (paracousie) et au téléphone Elles s'accompagnent ou non d'acouphènes, qui sont alors plutôt de timbre grave, peu gênants, bien localisés dans l'oreille maladeLa voix peut résonner dans l'oreille (autophonie), les patients n'élèvent pas la voix Elles peuvent s'accompagner de retard de langage chez l'enfant

pdf

cannot see any pdfs

Annotation 4967843106060

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de transmission Les tests supraliminaires et l'audiométrie vocale ne montrent pas d'altération qualitative de l'audition (distorsion)

pdf

cannot see any pdfs

Annotation 4967844154636

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Une surdité de transmission : A toujours un Rinne négatifN'entraîne pas de distorsion sonoreN'est jamais totale

pdf

cannot see any pdfs

Annotation 4967849921804

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Otospongiose C'est une ostéodystrophie de la capsule labyrinthique, d'origine multifactorielle (génétique, hormonale, virale…).Huit pour cent des sujets de race blanche en sont histologiquement atteints.Elle se manifeste cliniquement chez un sujet sur 1 000

pdf

cannot see any pdfs

Annotation 4967850970380

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL L'otospongiose doit être évoquée d'emblée devant toute surdité de transmission de l'adulte jeune, de sexe féminin (deux femmes pour un homme), survenue sans passé otologique, à tympan normal

pdf

cannot see any pdfs

Annotation 4967852018956

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Otospongiose : La surdité subit chez la femme des poussées évolutives lors des épisodes de la vie génitale (puberté, grossesse, allaitement, ménopause)

pdf

cannot see any pdfs

Annotation 4967853067532

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Un scanner normal n'élimine pas une otospongiose

pdf

cannot see any pdfs

Annotation 4967854116108

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Otospongiose : Le réflexe stapédien est aboli en cas d'ankylose complèteDans les stades débutants, on peut observer un effet « on-off », quasi pathognomonique d'ankylose stapédovestibulaire débutante L'effet « on-off » correspond à une augmentation transitoire de la compliance apparaissant lors du début de la stimulation (« on ») et lors de la fin de la stimulation (« off »).Ainsi, au lieu d'observer une déviation de l'aiguille vers le sens positif durant la recherche du réflexe stapédien, on observe deux déflexions successives vers le sens négatif

pdf

cannot see any pdfs

Annotation 4967855164684

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Otospongiose : Le traitement est avant tout chirurgical : ablation de l'étrier (stapédectomie) ou trou central de la platine (stapédotomie) et rétablissement de la continuité de la chaîne ossiculaire par un matériel prothétique.La prothèse stapédienne transmet les vibrations entre l'enclume et l'oreille interne, en court-circuitant l'ankylose stapédienne.Les résultats sont excellents : 95 % de restitution de l'audition

pdf

cannot see any pdfs

Annotation 4967856213260

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Séquelle d'otite : Cette surdité est en général fixée, quelquefois évolutive (labyrinthisation par atteinte progressive de l'oreille interne)

pdf

cannot see any pdfs

Annotation 4967857261836

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Séquelle d'otite : Elle est souvent chirurgicalement curable par tympanoplastie : En cas de perforation simple du tympan, une miryngoplastie peut être réaliséeEn cas d'atteinte ossiculaire associée, une chirurgie avec restauration du système tympano-ossiculaire fonctionnel doit être réalisée.

pdf

cannot see any pdfs

Annotation 4967858310412

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Séquelle d'otite Les résultats sont moins bons que dans l'otospongiose (50 à 70 % de réhabilitation fonctionnelle socialement correcte)

pdf

cannot see any pdfs

Annotation 4967859358988

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Aplasie d'oreille C'est une malformation congénitale de l'oreille externe et/ou moyenne d'origine génétique ou acquise (embryopathies rubéolique ou toxique).

pdf

cannot see any pdfs

Annotation 4967860407564

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Aplasie d'oreille C'est une surdité de transmission pure (l'oreille interne est généralement normale, puisque d'origine embryologique différente) ; elle est fixée, non évolutive

pdf

cannot see any pdfs

Annotation 4967861456140

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Aplasie d'oreille Elle est curable chirurgicalement. C'est une chirurgie difficile, spécialisée. L'indication opératoire : Est discutable dans les formes unilatérales, car celles-ci n'entraînent peu ou pas de retentissement fonctionnelNe peut être posée avant l'âge de 7 ans et après bilan scannographique

pdf

cannot see any pdfs

Annotation 4967862504716

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL L'aplasie du pavillon nécessite un geste chirurgical de reconstruction après l'âge de 8 ans.

pdf

cannot see any pdfs

Annotation 4967863553292

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités traumatiques Les fractures du rocher atteignant l'oreille moyenne entraînent une surdité de transmission : Réversible, en cas de simple hémotympanPermanente, par atteinte du système tympano-ossiculaire : Perforation tympaniqueFractureLuxation ossiculaire. La réparation fait appel alors aux techniques de tympanoplastie (et si besoin ossiculoplastie) à distance du traumatisme.

pdf

cannot see any pdfs

Annotation 4967864601868

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Rappelons que la surdité de transmission : Est contingente dans l'otite moyenne aiguë et guérit le plus souvent avec elleConstitue le signe majeur de l'otite séromuqueuse à tympan fermé L'otite séromuqueuse est la cause la plus fréquente de surdité de transmission de l'enfantL'aérateur transtympanique est efficace Peut être le premier et le seul signe d'un cholestéatome de l'oreille moyenne

pdf

cannot see any pdfs

Annotation 4967865650444

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités d'origine tumorale Les surdités d'origine tumorale sont très rares : Tumeur du glomus tympanojugulaireCarcinomes du CAE et de l'oreille moyenne

pdf

cannot see any pdfs

Annotation 4967866699020

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL L'otospongiose est la surdité de transmission la plus fréquente.Une surdité de transmission est chirurgicalement curable dans un nombre de cas important (chirurgie de la surdité).L'appareillage prothétique (prothèse auditive) est facile à adapter et efficace dans une surdité de transmission

pdf

cannot see any pdfs

Annotation 4967900777740

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Cliniques Les surdités de perception peuvent avoir les caractéristiques suivantes : Elles peuvent être uni- ou bilatéralesElles sont d'intensité variable, allant de la surdité légère à la cophoseElles entraînent, lorsqu'elles sont bilatérales et sévères, une élévation de la voix (« crier comme un sourd »)La gêne auditive est révélée ou aggravée en milieu bruyant et dans les conversations à plusieurs personnes (signe de la « cocktail party »)Elles s'accompagnent ou non d'acouphènes qui sont volontiers de timbre aigu (sifflements), mal tolérés, plus ou moins bien localisés dans l'oreilleElles peuvent s'accompagner de vertiges et/ou de troubles de l'équilibre (atteinte labyrinthique ou nerveuse)Elles s'accompagnent chez l'enfant d'un retard ou de troubles du langage

pdf

cannot see any pdfs

Annotation 4967901826316

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception En général, la perte prédomine sur les sons aigus Sauf en cas de maladie de Ménière, où la perte porte sur toutes les fréquences ou bien prédomine sur les graves

pdf

cannot see any pdfs

Annotation 4967902874892

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception Les tests supraliminaires et l'audiométrie vocale montrent, dans les atteintes de l'oreille interne, des altérations qualitatives de l'audition portant sur : La hauteur (diplacousie)L'intensité (recrutement)Le timbre. Ces altérations qualitatives sont habituellement absentes dans les atteintes du VIII

pdf

cannot see any pdfs

Annotation 4967903923468

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception L'audiométrie objective, par enregistrement des potentiels évoqués auditifs précoces, apporte souvent des éléments intéressants pour le diagnostic topographique (oreille interne, VIII, voies nerveuses).

pdf

cannot see any pdfs

Annotation 4967904972044

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Une surdité de perception : Peut être totale (cophose)A toujours un Rinne positifEntraîne des distorsions sonores.Les potentiels évoqués auditifs en permettent souvent un diagnostic topographique

pdf

cannot see any pdfs

Annotation 4967906020620

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception Surdité unilatérale brusque (SUB) « Coup de tonnerre dans un ciel serein », la surdité brusque, en règle unilatérale, survient brutalement, en quelques secondes ou minutes, accompagnée de sifflements unilatéraux et quelquefois de vertiges ou de troubles de l'équilibre

pdf

cannot see any pdfs

Annotation 4967907069196

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception - Surdité unilatérale brusque On peut simplement soupçonner quelquefois, sur des arguments anamnestiques en général discrets, une origine : Virale (rhinopharyngite datant de quelques jours, allure saisonnière)Vasculaire (sujet âgé, présence de facteurs de risque, d'atteinte vasculaire). Le pronostic fonctionnel est péjoratif (50 à 75 % ne récupèrent pas), surtout si la surdité est sévère ou profonde et si le traitement est retardé ou nul

pdf

cannot see any pdfs

Annotation 4967908117772

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La surdité unilatérale brusque est considérée comme une urgence médicale

pdf

cannot see any pdfs

Annotation 4967909166348

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception - SUB Un traitement médical peut être tenté dans les premières heures ou les premiers jours. Son efficacité est discutée, mais elle est nulle après le 8–10 e jour

pdf

cannot see any pdfs

Annotation 4967910214924

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Dix pour cent des patients présentant une surdité brusque sont porteurs d'un neurinome de l'acoustique. Il doit systématiquement être recherché face à une surdité unilatérale brusque (PEA ou IRM injectée)

pdf

cannot see any pdfs

Annotation 4967911263500

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de perception - SUB - TTT médical Quelle que soit la cause soupçonnée, il peut comprendre les éléments suivants : Mise en œuvre d'un traitement corticoïde, associant de façon variable, pendant 6 à 8 jours : Perfusions de vasodilatateursOxygénothérapie hyperbareHémodilution Un traitement de relais plus léger, qui peut être poursuivi pendant plusieurs semaines (vasodilatateurs…).

pdf

cannot see any pdfs

Annotation 4967912312076

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Toute surdité de perception évolutive ou fluctuante post-traumatique doit faire évoquer une fistule périlymphatique

pdf

cannot see any pdfs

Annotation 4967913360652

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités infectieuses : labyrinthites On distingue : Les labyrinthites otogènes par propagation de l'infection de l'oreille moyenne : Otite moyenne aiguëCholestéatome de l'oreille avec fistule du canal externe ou effraction trans-platinaire au niveau de la fenêtre ovale. Elles peuvent régresser en totalité ou en partie par un traitement antibiotique et corticoïde énergique et précoce Les neurolabyrinthites hématogènes, microbiennes (syphilis, exceptionnelle) et surtout : Oreillons : surdité unilatéraleZona auriculaire : atteinte du VIIIAutres virus neurotropes Les neurolabyrinthites suite à une méningite (surtout bactérienne). La surdité est en règle générale irréversible et incurable

pdf

cannot see any pdfs

Annotation 4967914409228

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Toute surdité unilatérale progressive de l'adulte de cause non évidente doit faire évoquer un neurinome de l'acoustique

pdf

cannot see any pdfs

Annotation 4967915457804

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Neurinome du VIII Le début, insidieux, est le plus souvent constitué par une surdité de perception unilatérale de l'adulte, d'évolution lentement progressive et remarquée en général fortuitement.Les acouphènes sont contingents, les troubles de l'équilibre discrets et inconstants.Le neurinome du VIII se révèle quelquefois par un symptôme brutal et unilatéral : surdité brusque, paralysie faciale

pdf

cannot see any pdfs

Annotation 4967916506380

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les étapes diagnostiques sont les suivantes : Examen clinique, avec recherche : D'hypoesthésie cornéenne unilatéraleDe signes vestibulaires spontanésDe signes vestibulaires provoqués (secouage de tête, vibrateur, Halmagyi)

pdf

cannot see any pdfs

Annotation 4967940361484

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Neurinome du VIII Examen fonctionnel cochléovestibulaire : Audiométrie tonale et vocale (surdité de perception avec intelligibilité effondrée)Potentiels évoqués auditifs : examen fonctionnel essentiel et fiable (l'allongement des latences du côté atteint signe l'atteinte rétrocochléaire)Épreuves calorique et otolithique (déficit vestibulaire unilatéral) Imagerie : IRM du CAI-fosse postérieure avec injection de gadolinium

pdf

cannot see any pdfs

Annotation 4967941410060

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité d'origine génétique, maladie évolutive du jeune C'est une surdité de perception cochléaire, en règle bilatérale, d'installation progressive chez l'adulte jeune, s'aggravant au fil du temps, parfois très rapidement.Elle peut s'accompagner d'acouphènes bilatéraux. Le handicap fonctionnel est dramatique chez ce sujet en pleine activité professionnelle. Elle échappe à tout traitement médical ou chirurgical.Les vasodilatateurs sont classiquement prescrits, d'efficacité discutable

pdf

cannot see any pdfs

Annotation 4967942458636

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité du jeune L'origine génétique est souvent suspectée (autosomique dominant)

pdf

cannot see any pdfs

Annotation 4967943507212

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de sénescence, ou presbyacousie Ce n'est pas une maladie mais un processus normal de vieillissement portant sur toutes les structures neurosensorielles du système auditif (oreille interne, voies et centres nerveux).Ce processus commence très tôt vers l'âge de 25 ans (amputation des fréquences les plus aiguës du champ auditif) sans qu'il n'y ait avant longtemps de trouble de l'intelligibilité

pdf

cannot see any pdfs

Annotation 4967944555788

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La presbyacousie se manifeste socialement à partir de 65 ans par une gêne progressive de la communication verbale, beaucoup plus importante que ne le laisse prévoir la courbe audiométrique tonale, si des troubles de la sélectivité fréquentielle par atteinte des cellules ciliées externes, et de l'intégration corticale du message verbal, sont associés à l'atteinte d'oreille interne

pdf

cannot see any pdfs

Annotation 4967945604364

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Presbyacousie La prothèse auditive idéalement bilatérale constitue une aide appréciable si elle est prescrite précocement (à partir d'une chute bilatérale de 30 dB à 2 000 Hz) Son efficacité est améliorée si l'on y associe une prescription de rééducation orthophonique

pdf

cannot see any pdfs

Annotation 4967946652940

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Traumatismes sonores La zone d'alarme de la nuisance auditive est de 85 dB pendant 8 heures par jour. Les sons impulsifs et les spectres sonores aigus sont les plus nocifs.

pdf

cannot see any pdfs

Annotation 4967947701516

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Traumatismes sonores Les premiers signes de la surdité sont audiométriques : scotome auditif sur la fréquence 4 000 Hz, bilatéral.Puis la perte s'étend en tache d'huile vers les aigus et les fréquences conversationnelles.La gêne auditive apparaît alors, puis s'aggrave.La surdité n'évolue plus après éviction de l'ambiance sonore

pdf

cannot see any pdfs

Annotation 4967948750092

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Traumatismes aigus, accidentels Un bruit soudain et violent (déflagration…) peut entraîner une lésion de l'oreille interne et une surdité bilatérale, portant ou prédominant sur la fréquence 4 000 Hz, accompagnée souvent de sifflements d'oreille et quelquefois de vertiges.Elle est susceptible de régresser en totalité ou en partie.Elle justifie d'un traitement médical d'urgence qui est celui des surdités unilatérales brusques

pdf

cannot see any pdfs

Annotation 4967949798668

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La surdité toxique est bilatérale lorsque la drogue est délivrée par voie générale, elle prédomine sur les fréquences aiguës. Elle est irréversible et incurable. En règle générale, il s'agit des aminosides : Ils sont ototoxiques sur la cochlée et/ou le vestibuleLes nouveaux aminosides ont une ototoxicité moins importante que la streptomycine et un tropisme plutôt vestibulaire que cochléaire

pdf

cannot see any pdfs

Annotation 4967950847244

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Ototoxicité Les autres médicaments incriminés sont les suivants : Diurétiques : furosémide (potentialise l'ototoxicité des aminosides)Antimitotiques : cisplatine, moutarde azotéeQuinine et dérivésRétinoïdesCertains produits industriels : CO (monoxyde de carbone), Hg (mercure), Pb (plomb)…

pdf

cannot see any pdfs

Annotation 4968125435148

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les atteintes auditives par lésion des voies centrales de l'audition lors d'atteintes hautes du tronc cérébral ou des régions sous-cortico-corticales ne méritent pas le nom de surdité. Elles ne se manifestent pas par une baisse de l'ouïe, mais par des troubles gnosiques : le sujet entend (audiogramme tonal normal), mais ne comprend pas (audiogramme vocal altéré).Souvent les lésions des voies auditives centrales n'entraînent aucune plainte auditive (sclérose en plaques ou tumeur du tronc cérébral, par exemple)

pdf

cannot see any pdfs

Annotation 4968126483724

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL ototoxicité (sels de platine : carboplatine moins toxique que le cisplatine)

pdf

cannot see any pdfs

Annotation 4968127532300

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La conséquence en est en effet un trouble de la communication orale d'autant plus important que le seuil est élevé : • majeur, lorsque la surdité est sévère ou profonde (supérieure à 70 dB) ; • plus ou moins marqué lorsqu'elle est moyenne (entre 40 et 70 dB) voire légère

pdf

cannot see any pdfs

Annotation 4968128580876

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de l'enfant Un gazouillis normal peut s'installer vers 3 mois, simple « jeu moteur » des organes phonateurs, qui peut faire illusion, mais disparaît vers l'âge de 1 an

pdf

cannot see any pdfs

Annotation 4968129629452

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les surdités légères ou moyennes peuvent prendre le masque d'un banal retard scolaire et faire orienter faussement le diagnostic vers des troubles caractériels ou un problème psychologique. Les troubles de l'articulation sont fréquents

pdf

cannot see any pdfs

Annotation 4968130678028

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Un enfant entendant à la naissance peut devenir malentendant. Cette notion d'évolutivité plaide à la fois pour le dépistage néonatal et au cours des premières années

pdf

cannot see any pdfs

Annotation 4968131726604

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdité de l'enfant En période néonatale : deux techniques d'audiométrie objective sont alors utilisées : Les otoémissions acoustiques provoquées (OEAP) (5 % de faux positifs) : L'absence d'OEAP traduit soit une surdité (sans pour autant présager de sa profondeur) soit, cas le plus fréquent, de mauvaises conditions d'examen (l'enfant doit en effet être endormi ou calme, se trouver dans une pièce silencieuse, ses conduits auditifs externes doivent être propres…) Les potentiels évoqués auditifs automatisés (PEAA) (1 % de faux positifs) : La stimulation sonore est envoyée à une intensité fixe de 35 dB le plus souvent.La réponse sera binaire : test réussi ou échoué.Si le test est réussi, l'audition est considérée comme a priori normale (sauf cas de surdité préservant les fréquences 2 000 à 4 000 Hz)Si le test a échoué, cela traduit soit une surdité soit de mauvaises conditions d'examens Vers 4 mois (examen non obligatoire) : C'est l'étude des réactions auditives aux bruits familiers (voix de la mère, biberon, porte…) Au 9 e mois : On utilise les bruits familiers et les jouets sonores divers, calibrés en fréquence et en intensité Au 24 e mois : La voix chuchotée, la voix haute, les jouets sonores sont les stimuli le plus souvent utilisés À l'entrée à l'école vers 6 ans : Les surdités sévères ou profondes ont en général été dépistéesL'audiogramme du médecin scolaire peut révéler une hypoacousie légère ou moyenne.

pdf

cannot see any pdfs

Annotation 4968132775180

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Le grand enfant : à partir de 5 ans (niveau du développement psychomoteur de l'enfant) Les techniques d'audiométrie subjective tonale et vocale de l'adulte peuvent être utilisées

pdf

cannot see any pdfs

Annotation 4968133823756

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Le jeune enfant : entre 10–12 mois et 5 ans On peut utiliser l'audiométrie par réflexe conditionné, réalisée par des médecins ORL. Elle repose sur l'établissement d'un réflexe conditionné dont le stimulus est un son qui provoque une réponse après apprentissage : Un geste automatico-réflexe : l'enfant tourne la tête vers la source sonore (réflexe d'orientation conditionné, ou ROC, dès 1 an)Ou un geste volontaire à but ludique : l'enfant appuie sur un bouton faisant apparaître des images amusantes (peep-show) ou mettant en marche un train jouet (train-show) (3–5 ans)

pdf

cannot see any pdfs

Annotation 4968134872332

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Avant 10 mois : l'audiométrie comportementale Le ROC n'est pas utilisable mais l'examinateur, en observant attentivement le comportement de l'enfant, pourra déceler des réactions aux stimulus sonores (arrêt de la tétée…) et établir l'équivalent d'une courbe auditive de la meilleure oreille

pdf

cannot see any pdfs

Annotation 4968135920908

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Audiométrie objective : à tout âge et dès la naissance L'audiométrie objective fait actuellement appel à l'enregistrement des PEA provoqués (PEAP), des ASSR (Auditory Steady-State Responses : testent les fréquences graves) et des OEAP

pdf

cannot see any pdfs

Annotation 4968136969484

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Le bilan orthophonique est indispensable pour compléter le bilan d'une surdité de l'enfant

pdf

cannot see any pdfs

Annotation 4968154533132

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La maladie de Lobstein (maladie des os de verre) associe à la surdité une fragilité osseuse, des sclérotiques bleues, une hyperlaxité ligamentaire

pdf

cannot see any pdfs

Annotation 4968155581708

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités d'origine génétique (50 à 60 % des cas)

pdf

cannot see any pdfs

Annotation 4968156630284

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités de perception Unilatérales, elles seront à l'origine de difficultés dans le bruit et à la localisation des sons.Elles n'ont pas de conséquence majeure sur le développement du langage ou sur le plan social ; elles sont souvent de découverte fortuite

pdf

cannot see any pdfs

Annotation 4968157678860

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités de perception : Bilatérales, elles se répartissent en : Surdités isolées (non syndromiques), non évolutives, génétiques, en général récessives, constituant 60 % des surdités sévères ou profondes de l'enfant La mutation la plus fréquemment retrouvée concerne le gène codant la connexine 26 Surdités associées (syndromiques) à d'autres malformations, réalisant de nombreux (mais très rares) syndromes plus ou moins complexes Syndrome d'Usher : rétinite pigmentaireSyndrome de Wardenburg : mèche blanche, hétérochromie irienneSyndrome de Pendred : goitre avec hypothyroïdieSyndrome d'Alport : insuffisance rénaleSyndrome de Jerwell-Lange-Nielsen : altérations cardiaques (troubles ECG : QT long), risque de mort subiteMucopolysaccharidoses (thésaurismoses) : maladie de Hurler (gargoïlisme), maladie de Morquio

pdf

cannot see any pdfs

Annotation 4968158727436

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les embryopathies et les fœtopathies constituent près de 15 % des surdités bilatérales sévères ou profondes : TORCH syndrome : ToxoplasmoseO pour « Others » (syphilis, VIH)RubéoleCMVHerpès

pdf

cannot see any pdfs

Annotation 4968159776012

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Surdités de perception Toutes les surdités de perception moyennes à profondes doivent être appareillées précocement.Un appareillage est possible dès les premiers mois

pdf

cannot see any pdfs

Annotation 4968160824588

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL En cas de surdité de perception bilatérale sévère ou profonde avec des résultats prothétiques insuffisants, il faut envisager la mise en place d'un implant cochléaire (prothèse électronique avec électrodes de stimulation implantées dans la cochlée)

pdf

cannot see any pdfs

Annotation 4968161873164

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL La langue des signes est proposée en cas de surdité profonde bilatérale sans espoir de réhabilitation auditive efficace par des prothèses adaptées (prothèse acoustique ou implant cochléaire) ou choix parental (projet visuogestuel)

pdf

cannot see any pdfs

Annotation 4968162921740

 #44 #87 #Altération #Audition #Cours #Facultaires #Médecine #ORL Les principales étapes du développement du langage de l'enfant sont donc des repères fondamentaux : Réaction aux bruits dès la naissanceGazouillis vers 3 moisReconnaissance du nom vers 4 moisImitation des sons et des intonations vers 6 moisDébut du babillage vers 6 moisRedouble les syllabes entre 6 et 10 moisPremiers mots à 12 moisQuelques mots reconnaissables à 18 moisUtilisation d'un vocabulaire de 50 mots et juxtaposition de deux à trois mots vers 18–24 mois

pdf

cannot see any pdfs

Flashcard 4968163970316

Tags
#MLBook #data-origin #machine-learning
Question
Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from [...].
nature, be handcrafted by humans or generated by another algorithm

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
hine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from <span>nature, be handcrafted by humans or generated by another algorithm. <span>

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4968179436812

 [unknown IMAGE 4968183893260] #has-images Geometric visualisation of the mode, median and mean of an arbitrary probability density function.

Probability density function - Wikipedia
region describes the probability of an event occurring in that region [imagelink] [emptylink] Boxplot and probability density function of a normal distribution N(0, σ2). [imagelink] [emptylink] <span>Geometric visualisation of the mode , median and mean of an arbitrary probability density function.[1] In probability theory , a probability density function (PDF), or density of a continuous random variable , is a function whose value at any given sample (or point) in the sample space (

Flashcard 4968186514700

Tags
#has-images
Question
Geometric visualisation of the mode, median and mean of an arbitrary probability density function.
[unknown IMAGE 4968183893260]

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Geometric visualisation of the mode , median and mean of an arbitrary probability density function.

Original toplevel document

Probability density function - Wikipedia
region describes the probability of an event occurring in that region [imagelink] [emptylink] Boxplot and probability density function of a normal distribution N(0, σ2). [imagelink] [emptylink] <span>Geometric visualisation of the mode , median and mean of an arbitrary probability density function.[1] In probability theory , a probability density function (PDF), or density of a continuous random variable , is a function whose value at any given sample (or point) in the sample space (

Flashcard 4968188873996

Tags
#MLBook #dataset #examples #machine-learning #sample
Question
Most of the time we don’t know $$f_X$$ , but we can observe some values of $$X$$. In machine learning, we call these values [...], and the collection of these examples is called a sample or a dataset.
examples

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Most of the time we don’t know $$f_X$$ , but we can observe some values of $$X$$. In machine learning, we call these values examples, and the collection of these examples is called a sample or a dataset.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968191757580

Tags
#MLBook #dataset #examples #machine-learning #sample
Question
Most of the time we don’t know $$f_X$$ , but we can observe some values of $$X$$. In machine learning, we call these values examples, and the collection of these examples is called [...].
a sample or a dataset

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Most of the time we don’t know $$f_X$$ , but we can observe some values of $$X$$. In machine learning, we call these values examples, and the collection of these examples is called a sample or a dataset.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968198311180

Tags
#MLBook #SVM #classification-models #kNN #machine-learning #probability
Question
Some classification models, like [...], given a feature vector only output the class. Others, like logistic regression or decision trees, can also return the score between 0 and 1 which can be interpreted as either how confident the model is about the prediction or as the probability that the input example belongs to a certain class.
SVM and kNN

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Some classification models, like SVM and kNN, given a feature vector only output the class. Others, like logistic regression or decision trees, can also return the score between 0 and 1 which can be interpreted as either how confi

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968200146188

Tags
#MLBook #SVM #classification-models #kNN #machine-learning #probability
Question
Some classification models, like SVM and kNN, given a feature vector only output the class. Others, like [...], can also return the score between 0 and 1 which can be interpreted as either how confident the model is about the prediction or as the probability that the input example belongs to a certain class.
logistic regression or decision trees

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Some classification models, like SVM and kNN, given a feature vector only output the class. Others, like logistic regression or decision trees, can also return the score between 0 and 1 which can be interpreted as either how confident the model is about the prediction or as the probability that the input example belongs to a c

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968202767628

Tags
#MLBook #SVM #classification-models #kNN #machine-learning #probability
Question
Some classification models, like SVM and kNN, given a feature vector only output the class. Others, like logistic regression or decision trees, can also return the score between 0 and 1 which can be interpreted as either [...].
how confident the model is about the prediction or as the probability that the input example belongs to a certain class

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
like SVM and kNN, given a feature vector only output the class. Others, like logistic regression or decision trees, can also return the score between 0 and 1 which can be interpreted as either <span>how confident the model is about the prediction or as the probability that the input example belongs to a certain class. <span>

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968205651212

Tags
#MLBook #cardinality-operator #machine-learning
Question
The cardinality operator $$\left\vert \mathcal S \right\vert$$ returns [...].
the number of elements in set $$\mathcal S$$

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The cardinality operator $$\left\vert \mathcal S \right\vert$$ returns the number of elements in set $$\mathcal S$$.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968209059084

Tags
#MLBook #logistic-regression #machine-learning
Question
The first thing to say is that logistic regression is not a regression, but a classification learning algorithm. The name comes from statistics and is due to the fact that [...].
the mathematical formulation of logistic regression is similar to that of linear regression

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The first thing to say is that logistic regression is not a regression, but a classification learning algorithm. The name comes from statistics and is due to the fact that the mathematical formulation of logistic regression is similar to that of linear regression.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968214826252

Tags
#MLBook #deep-learning #deep-neural-networks #layer #machine-learning #neural-network #shallow-learning
Question
Differentiate shallow learning from deep learning.
A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The notorious exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, simply, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968223477004

Tags
#MLBook #in-memory-versus-out-of-memory #incremental-learning-algorithms #learning-algorithm-selection #machine-learning
Question
Discuss about in-memory vs. out-of-memory regarding a machine learning algorithm.
Can your dataset be fully loaded into the RAM of your server or personal computer? If yes, then you can choose from a wide variety of algorithms. Otherwise, you would prefer incremental learning algorithms that can improve the model by adding more data gradually.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In-memory vs. out-of-memory Can your dataset be fully loaded into the RAM of your server or personal computer? If yes, then you can choose from a wide variety of algorithms. Otherwise, you would prefer incremental learning algorithms that can improve the model by adding more data gradually.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968228457740

Tags
#MLBook #goal #model #supervised-learning
Question
The goal of a supervised learning algorithm is to [...]. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.
use the dataset to produce a model that takes a feature vector $$\mathbf x$$ as input and outputs information that allows deducing the label for this feature vector

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
The goal of a supervised learning algorithm is to use the dataset to produce a model that takes a feature vector $$\mathbf x$$ as input and outputs information that allows deducing the label for this feature vector. For instance, the model created using the dataset of people could take as input a feature vector describing a person and output a probability that the person has cancer.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968233438476

Tags
#MLBook #bias #features #high-bias #low-bias #machine-learning #underfitting
Question
Discuss about underfitting in machine learning.

I mentioned above the notion of bias. I said that a model has a low bias if it predicts well the labels of the training data. If the model makes many mistakes on the training data, we say that the model has a high bias or that the model underfits. So, underfitting is the inability of the model to predict well the labels of the data it was trained on. There could be several reasons for underfitting, the most important of which are:

• your model is too simple for the data (for example a linear model can often underfit);
• the features you engineered are not informative enough.

The first reason is easy to illustrate in the case of one-dimensional regression: the dataset can resemble a curved line, but our model is a straight line. The second reason can be illustrated like this: let’s say you want to predict whether a patient has cancer, and the features you have are height, blood pressure, and heart rate. These three features are clearly not good predictors for cancer so our model will not be able to learn a meaningful relationship between these features and the label.

The solution to the problem of underfitting is to try a more complex model or to engineer features with higher predictive power.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
I mentioned above the notion of bias. I said that a model has a low bias if it predicts well the labels of the training data. If the model makes many mistakes on the training data, we say that the model has a high bias or that the model underfits. So, underfitting is the inability of the model to predict well the labels of the data it was trained on. There could be several reasons for underfitting, the most important of which are: your model is too simple for the data (for example a linear model can often underfit); the features you engineered are not informative enough. The first reason is easy to illustrate in the case of one-dimensional regression: the dataset can resemble a curved line, but our model is a straight line. The second reason can be illustrated like this: let’s say you want to predict whether a patient has cancer, and the features you have are height, blood pressure, and heart rate. These three features are clearly not good predictors for cancer so our model will not be able to learn a meaningful relationship between these features and the label. The solution to the problem of underfitting is to try a more complex model or to engineer features with higher predictive power.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968240778508

Tags
#MLBook #machine-learning #semi-supervised-learning
Question
What is semi-supervised learning?

In semi-supervised learning, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a semi-supervised learning algorithm is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model.

It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In semi-supervised learning, the dataset contains both labeled and unlabeled examples. Usually, the quantity of unlabeled examples is much higher than the number of labeled examples. The goal of a semi-supervised learning algorithm is the same as the goal of the supervised learning algorithm. The hope here is that using many unlabeled examples can help the learning algorithm to find (we might say “produce” or “compute”) a better model. It could look counter-intuitive that learning could benefit from adding more unlabeled examples. It seems like we add more uncertainty to the problem. However, when you add unlabeled examples, you add more information about your problem: a larger sample reflects better the probability distribution the data we labeled came from. Theoretically, a learning algorithm should be able to leverage this additional information.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968257555724

Tags
#L1-regularization #MLBook #hyperparameter #machine-learning
Question
Discuss about L1 regularization as applied to linear regression.

Recall the linear regression objective:

$$\displaystyle \min_{\mathbf w, b} \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2. \tag{2}$$

An L1-regularized objective looks like this:

$$\displaystyle \min_{\mathbf w, b} \left[ C \left\vert \mathbf w \right\vert + \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2 \right], \tag{3}$$

where $$\left\vert \mathbf w \right\vert \stackrel{\textrm{def}}{=} \sum_{j=1}^D \left\vert w^{(j)} \right\vert$$ and $$C$$ is a hyperparameter that controls the importance of regularization. If we set $$C$$ to zero, the model becomes a standard non-regularized linear regression model. On the other hand, if we set to $$C$$ to a high value, the learning algorithm will try to set most $$w^{(j)}$$ to a very small value or zero to minimize the objective, the model will become very simple which can lead to underfitting. Your role as the data analyst is to find such a value of the hyperparameter $$C$$ that doesn’t increase the bias too much but reduces the variance to a level reasonable for the problem at hand.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Recall the linear regression objective: $$\displaystyle \min_{\mathbf w, b} \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2. \tag{2}$$ An L1-regularized objective looks like this: $$\displaystyle \min_{\mathbf w, b} \left[ C \left\vert \mathbf w \right\vert + \frac{1}{N} \displaystyle \sum_{i=1}^N \left( f_{\mathbf w, b \left( \mathbf x_i \right)} - y_i \right)^2 \right], \tag{3}$$ where $$\left\vert \mathbf w \right\vert \stackrel{\textrm{def}}{=} \sum_{j=1}^D \left\vert w^{(j)} \right\vert$$ and $$C$$ is a hyperparameter that controls the importance of regularization. If we set $$C$$ to zero, the model becomes a standard non-regularized linear regression model. On the other hand, if we set to $$C$$ to a high value, the learning algorithm will try to set most $$w^{(j)}$$ to a very small value or zero to minimize the objective, the model will become very simple which can lead to underfitting. Your role as the data analyst is to find such a value of the hyperparameter $$C$$ that doesn’t increase the bias too much but reduces the variance to a level reasonable for the problem at hand.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968270138636

Tags
#MLBook #decision-boundary
Question
In machine learning, the boundary separating the examples of different classes is called the [...].
decision boundary

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In machine learning, the boundary separating the examples of different classes is called the decision boundary.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968281672972

Tags
#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning
Question
In unsupervised learning, the dataset is [...]. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input $$\mathbf x$$; in outlier detection, the output is a real number that indicates how $$\mathbf x$$ is different from a “typical” example in the dataset.
a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In unsupervised learning, the dataset is a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector $$\mathbf x$$ as input and either transfor

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968283245836

Tags
#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning
Question
In unsupervised learning, the dataset is a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to [...]. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input $$\mathbf x$$; in outlier detection, the output is a real number that indicates how $$\mathbf x$$ is different from a “typical” example in the dataset.
create a model that takes a feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
vised learning, the dataset is a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to <span>create a model that takes a feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector th

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968284818700

Tags
#MLBook #clustering #dimensionality-reduction #machine-learning #model #outlier-detection #unsupervised-learning
Question
In unsupervised learning, the dataset is a collection of unlabeled examples $$\{\mathbf x_i\}^N_{i=1}$$. Again, $$\mathbf x$$ is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns [...]. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input $$\mathbf x$$; in outlier detection, the output is a real number that indicates how $$\mathbf x$$ is different from a “typical” example in the dataset.
the id of the cluster for each feature vector in the dataset

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
feature vector $$\mathbf x$$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem. For example, in clustering , the model returns <span>the id of the cluster for each feature vector in the dataset. In dimensionality reduction, the output of the model is a feature vector that has fewer features than the input $$\mathbf x$$; in outlier detection, the output is a real number that in

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968293469452

Tags
#MLBook #SVM #has-images #machine-learning #non-linearity
[unknown IMAGE 4773373938956]
Question

SVM can be adapted to work with datasets that cannot be separated by a hyperplane in its original space. Indeed, if we manage to transform the original space into a space of higher dimensionality, we could hope that the examples will become linearly separable in this transformed space. In SVMs, using a function to implicitly transform the original space into a higher dimensional space during the cost function optimization is called the [...].

The effect of applying the kernel trick is illustrated in Figure 6. As you can see, it’s possible to transform a two-dimensional non-linearly-separable data into a linearly-separable three-dimensional data using a specific mapping $$\phi: \mathbf x \mapsto \phi (\mathbf x)$$, where $$\phi (\mathbf x)$$ is a vector of higher dimensionality than $$\mathbf x$$. For the example of 2D data in Figure 5 (right), the mapping $$\phi$$ for that projects a 2D example $$\mathbf x = \left[ q, p \right]$$ into a 3D space (Figure 6) would look like this: $$\phi \left( \left[ q, p \right] \right) \stackrel{\textrm{def}}{=} \left( q^2, \sqrt{2} qp, p^2\right)$$, where $$\cdot^2$$ means $$\cdot$$ squared. You see now that the data becomes linearly separable in the transformed space.

kernel trick

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
rly separable in this transformed space. In SVMs, using a function to implicitly transform the original space into a higher dimensional space during the cost function optimization is called the <span>kernel trick. The effect of applying the kernel trick is illustrated in Figure 6. As you can see, it’s possible to transform a two-dimensional non-linearly-separable data into a linearly-separable t

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968301071628

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the input representation of each token is [...] of its token, segment and position embeddings.
the sum

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968302644492

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the input representation of each token is the sum of its [...], segment and position embeddings.
token

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968304217356

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the input representation of each token is the sum of its token, [...] and position embeddings.
segment

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968305790220

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the input representation of each token is the sum of its token, segment and [...] embeddings.
position

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968307363084

Tags
#bert #knowledge-base-construction #nlp #unfinished
Question
In BERT, the input representation of each token is the sum of its token, segment and position [...].
embeddings

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
In BERT, the input representation of each token is the sum of its token, segment and position embeddings.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968312343820

Tags
#machine-learning #software-engineering #unfinished
Question
Because of the system-level complexity of machine-learning code, [...] of system behavior in real time is critical.
monitoring

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968313916684

Tags
#machine-learning #software-engineering #unfinished
Question
Because of the system-level complexity of machine-learning code, monitoring of [...] in real time is critical.
system behavior

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968315489548

Tags
#machine-learning #software-engineering #unfinished
Question
Because of the system-level complexity of machine-learning code, monitoring of system behavior [...] is critical.
in real time

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968317062412

Tags
#machine-learning #software-engineering #unfinished
Question
Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is [...]
critical.

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Because of the system-level complexity of machine-learning code, monitoring of system behavior in real time is critical.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968321518860

Tags
#knowledge-base-construction #machine-learning #unfinished
Question
Fonduer aligns the word sequences of the converted PDFs with their original files by checking if both their [...] and number of repeated occurrences before the current word are the same.
characters

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
Fonduer aligns the word sequences of the converted PDFs with their original files by checking if both their characters and number of repeated occurrences before the current word are the same.

Original toplevel document (pdf)

cannot see any pdfs

Flashcard 4968492698892

Question
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, [...] will organically happen anyway
its "core principles"

status measured difficulty not learned 37% [default] 0

Parent (intermediate) annotation

Open it
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, its "core principles" will organically happen anyway

Original toplevel document

The Failure of Agile : programming
lmost anything can be considered Agile. Yet most "agile experts" still manage to violate the core principles. Continue this thread level 2 Tech_Itch 44 points · 4 years ago · edited 4 years ago <span>What's hilarious to me is that since the Agile manifesto is so vague, you could say that its "core principles" will organically happen in many small shops anyway: Individuals and interactions over Processes and tools: Everyone will insist on using their own tools, and fiercely defend their choice. Much time will be spent in "individual interacti

Flashcard 4968719715596

Question
What's hilarious to me is that since the Agile manifesto is so vague, you could say that in many smally shops, its "core principles" [...]