BuboFlash - helps with learning

Edited, memorised or added to reading queue

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 3814584093964

Question

In linux, the kernel caches write operations in memory for performance reasons and these are then flushed (physically commit to the hard disk) every so often. Write command to force a flush to disk:

Answer

sync

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

19. Partitions, File Systems, Formatting, Mounting
aches write operations in memory for performance reasons. These flush (physically commit to the magnetic media) every so often, but you sometimes want to force a flush. This is done simply with sync Next: 20. Advanced Shell Scripting Up: rute Previous: 18. UNIX Devices Contents

Annotation 4761704074508

Am häufigsten assoziiert sind die Erkrankungen, die dem atopi- schen Formenkreis zugerechnet werden, wie allergische Konjunktivitis, allergi- sche Rhinitis, Asthma bronchiale, aller- gische Enterokolitis, seltener Urtikaria, Nahrungsmittelallergie und Anaphylaxie

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Flashcard 4761705647372

Question

Am häufigsten assoziiert sind die Erkrankungen, die dem atopi- schen Formenkreis zugerechnet werden, wie [...], allergi- sche Rhinitis, Asthma bronchiale, aller- gische Enterokolitis, seltener Urtikaria, Nahrungsmittelallergie und Anaphylaxie

Answer

allergische Konjunktivitis

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

Parent (intermediate) annotation

Open it
Am häufigsten assoziiert sind die Erkrankungen, die dem atopi- schen Formenkreis zugerechnet werden, wie allergische Konjunktivitis, allergi- sche Rhinitis, Asthma bronchiale, aller- gische Enterokolitis, seltener Urtikaria, Nahrungsmittelallergie und Anaphylaxie

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4761707482380

#deep-learning

Deep learning isn’t always the right tool for the job—sometimes there isn’t enough data for deep learning to be applicable, and sometimes the problem is better solved by a different algorithm. If deep learning is your first contact with machine learning, then you may find yourself in a situation where all you have is the deep-learning hammer, and every machine-learning problem starts to look like a nail. The only way not to fall into this trap is to be familiar with other approaches and practice them when appropriate.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761709841676

#probabilistic-modeling

Probabilistic modeling is the application of the principles of statistics to data analysis. It was one of the earliest forms of machine learning, and it’s still widely used to this day. One of the best-known algorithms in this category is the Naive Bayes algorithm.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761712725260

#Bayes-theorem

Bayes’ theorem is stated mathematically as the following equation:^[2]

\({\displaystyle P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}}\)

where \(A\) and \(B\) are events and \({\displaystyle P(B)\neq 0}\).

\(P(A\mid B)\) is a conditional probability: the likelihood of event \(A\) occurring given that \(B\) is true.
\({\displaystyle P(B\mid A)}\) is also a conditional probability: the likelihood of event \(B\) occurring given that \(A\) is true.
\(P(A)\) and \(P(B)\) are the probabilities of observing \(A\) and \(B\) respectively; they are known as the marginal probability.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Bayes' theorem - Wikipedia
lso 10 Notes 11 References 12 Further reading 13 External links Statement of theorem[edit ] [imagelink] [emptylink] Visualization of Bayes’ theorem by superposition of two event tree diagrams . Bayes’ theorem is stated mathematically as the following equation:[2] P ( A ∣ B ) = P ( B ∣ A ) P ( A ) P ( B ) {\displaystyle P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}} where A {\displaystyle A} and B {\displaystyle B} are events and P ( B ) ≠ 0 {\displaystyle P(B)\neq 0} . P ( A ∣ B ) {\displaystyle P(A\mid B)} is a conditional probability : the likelihood of event A {\displaystyle A} occurring given that B {\displaystyle B} is true. P ( B ∣ A ) {\displaystyle P(B\mid A)} is also a conditional probability: the likelihood of event B {\displaystyle B} occurring given that A {\displaystyle A} is true. P ( A ) {\displaystyle P(A)} and P ( B ) {\displaystyle P(B)} are the probabilities of observing A {\displaystyle A} and B {\displaystyle B} respectively; they are known as the marginal probability . Examples[edit ] Drug testing[edit ] [imagelink] [emptylink] Tree diagram illustrating drug testing example. U, Ū, “+” and “−” are the events representing user, non-user, positive result

Annotation 4761715608844

[unknown IMAGE 4761717968140]

#Bayes-theorem-example #has-images

Suppose that a test for using a particular drug is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users and 99% true negative results for non-drug users. Suppose that 0.5% of people are users of the drug. What is the probability that a randomly selected individual with a positive test is a drug user?

\({\displaystyle {\begin{aligned}P({\text{User}}\mid {\text{+}})&={\frac {P({\text{+}}\mid {\text{User}})P({\text{User}})}{P(+)}}\\&={\frac {P({\text{+}}\mid {\text{User}})P({\text{User}})}{P({\text{+}}\mid {\text{User}})P({\text{User}})+P({\text{+}}\mid {\text{Non-user}})P({\text{Non-user}})}}\\[8pt]&={\frac {0.99\times 0.005}{0.99\times 0.005+0.01\times 0.995}}\\[8pt]&\approx 33.2\%\end{aligned}}}\)

Even if an individual tests positive, it is more likely that they do not use the drug than that they do. This is because the number of non-users is large compared to the number of users. The number of false positives outweighs the number of true positives. For example, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≈ 5 true positives are expected. Out of 15 positive results, only 5 are genuine.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Bayes' theorem - Wikipedia
nk] Tree diagram illustrating drug testing example. U, Ū, “+” and “−” are the events representing user, non-user, positive result and negative result. Percentages in parentheses are calculated. Suppose that a test for using a particular drug is 99% sensitive and 99% specific . That is, the test will produce 99% true positive results for drug users and 99% true negative results for non-drug users. Suppose that 0.5% of people are users of the drug. What is the probability that a randomly selected individual with a positive test is a drug user? P ( User ∣ + ) = P ( + ∣ User ) P ( User ) P ( + ) = P ( + ∣ User ) P ( User ) P ( + ∣ User ) P ( User ) + P ( + ∣ Non-user ) P ( Non-user ) = 0.99 × 0.005 0.99 × 0.005 + 0.01 × 0.995 ≈ 33.2 % {\displaystyle {\begin{aligned}P({\text{User}}\mid {\text{+}})&={\frac {P({\text{+}}\mid {\text{User}})P({\text{User}})}{P(+)}}\\&={\frac {P({\text{+}}\mid {\text{User}})P({\text{User}})}{P({\text{+}}\mid {\text{User}})P({\text{User}})+P({\text{+}}\mid {\text{Non-user}})P({\text{Non-user}})}}\\[8pt]&={\frac {0.99\times 0.005}{0.99\times 0.005+0.01\times 0.995}}\\[8pt]&\approx 33.2\%\end{aligned}}} Even if an individual tests positive, it is more likely that they do not use the drug than that they do. This is because the number of non-users is large compared to the number of users. The number of false positives outweighs the number of true positives. For example, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≈ 5 true positives are expected. Out of 15 positive results, only 5 are genuine. The importance of specificity in this example can be seen by calculating that even if sensitivity is raised to 100% and specificity remains at 99% then the probability of the person bei

Annotation 4761721113868

#Naive-Bayes

Naive Bayes is a type of machine-learning classifier based on applying Bayes’ theorem while assuming that the features in the input data are all independent (a strong, or “naive” assumption, which is where the name comes from).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761723473164

#logistic-regression

A closely related model is the logistic regression (logreg for short), which is sometimes considered to be the “hello world” of modern machine learning. Don’t be misled by its name—logreg is a classification algorithm rather than a regression algorithm. Much like Naive Bayes, logreg predates computing by a long time, yet it’s still useful to this day, thanks to its simple and versatile nature. It’s often the first thing a data scientist will try on a dataset to get a feel for the classification task at hand.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761726094604

#early-neural-networks

Although the core ideas of neural networks were investigated in toy forms as early as the 1950s, the approach took decades to get started. For a long time, the missing piece was an efficient way to train large neural networks. This changed in the mid-1980s, when multiple people independently rediscovered the Backpropagation algorithm—a way to train chains of parametric operations using gradient-descent optimization (later in the book, we’ll precisely define these concepts)—and started applying it to neural networks.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761728453900

#early-neural-networks #first-successful-practical-application

The first successful practical application of neural nets came in 1989 from Bell Labs, when Yann LeCun combined the earlier ideas of convolutional neural networks and backpropagation, and applied them to the problem of classifying handwritten digits. The resulting network, dubbed LeNet, was used by the United States Postal Service in the 1990s to automate the reading of ZIP codes on mail envelopes.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761732910348

#CNN #convolutional-neural-network

The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers^[9]

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Convolutional neural network - Wikipedia
etwork learns the filters that in traditional algorithms were hand-engineered . This independence from prior knowledge and human effort in feature design is a major advantage. Definition[edit ] The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution . Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers[9] Design[edit ] A convolutional neural network consists of an input and an output layer, as well as multiple hidden layers . The hidden layers of a CNN typically consist of a series of co

Annotation 4761735793932

CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Convolutional neural network - Wikipedia
stimuli only in a restricted region of the visual field known as the receptive field . The receptive fields of different neurons partially overlap such that they cover the entire visual field. CNNs use relatively little pre-processing compared to other image classification algorithms . This means that the network learns the filters that in traditional algorithms were hand-engineered . This independence from prior knowledge and human effort in feature design is a major advantage. Definition[edit ] The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution . Convolution is a specialized kind of linear op

Annotation 4761741036812

#kernel-methods

As neural networks started to gain some respect among researchers in the 1990s, thanks to this first success, a new approach to machine learning rose to fame and quickly sent neural nets back to oblivion: kernel methods. Kernel methods are a group of classification algorithms, the best known of which is the support vector machine (SVM).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761750473996

[unknown IMAGE 4761744968972]

#SVM #has-images #support-vector-machine

SVMs aim at solving classification problems by finding good decision boundaries (see figure 1.10) between two sets of points belonging to two different categories. A decision boundary can be thought of as a line or surface separating your training data into two spaces corresponding to two categories. To classify new data points, you just need to check which side of the decision boundary they fall on.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761753357580

[unknown IMAGE 4761744968972]

#SVM #has-images #support-vector-machine

SVMs proceed to find these boundaries in two steps: 1 The data is mapped to a new high-dimensional representation where the decision boundary can be expressed as a hyperplane (if the data was two-dimensional, as in figure 1.10, a hyperplane would be a straight line). 2 A good decision boundary (a separation hyperplane) is computed by trying to maximize the distance between the hyperplane and the closest data points from each class, a step called maximizing the margin. This allows the boundary to generalize well to new samples outside of the training dataset.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761757289740

#SVM #support-vector-machine

At the time they were developed, SVMs exhibited state-of-the-art performance on simple classification problems and were one of the few machine-learning methods backed by extensive theory and amenable to serious mathematical analysis, making them well understood and easily interpretable. Because of these useful properties, SVMs became extremely popular in the field for a long time.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761762532620

#SVM #support-vector-machine

In machine learning, support-vector machines (SVMs, also support-vector networks^[1]) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Support-vector machine - Wikipedia
CML ML JMLR ArXiv:cs.LG Glossary of artificial intelligence Glossary of artificial intelligence Related articles List of datasets for machine-learning research Outline of machine learning v t e In machine learning , support-vector machines (SVMs, also support-vector networks[1]) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis . Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall. In addition to performing linear classification , SVMs can efficiently perform a non-linear classification using what is called the kernel trick , implicitly mapping their inputs into h

Annotation 4761765416204

#SVM #kernel #support-vector-machine

A kernel function is a computationally tractable operation that maps any two points in your initial space to the distance between these points in your target representation space, completely bypassing the explicit computation of the new representation.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761770134796

[unknown IMAGE 4761768037644]

#decision-tree #has-images

Decision trees are flowchart-like structures that let you classify input data points or predict output values given inputs (see figure 1.11). They’re easy to visualize and interpret. Decisions trees learned from data began to receive significant research interest in the 2000s, and by 2010 they were often preferred to kernel methods.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761774853388

#decision-tree #random-forest

the Random Forest algorithm introduced a robust, practical take on decision-tree learning that involves building a large number of specialized decision trees and then ensembling their outputs

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761777212684

#gradient-boosting-machine

A gradient boosting machine, much like a random forest, is a machine-learning technique based on ensembling weak prediction models, generally decision trees.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761779834124

#gradient-boosting

[A gradient boosting machine] uses gradient boosting, a way to improve any machine-learning model by iteratively training new models that specialize in addressing the weak points of the previous models.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761801067788

#convnets #deep-convolutional-neural-networks #deep-learning

Since 2012, deep convolutional neural networks (convnets) have become the go-to algorithm for all computer vision tasks; more generally, they work on all perceptual tasks. At major computer vision conferences in 2015 and 2016, it was nearly impossible to find presentations that didn’t involve convnets in some form. At the same time, deep learning has also found applications in many other types of problems, such as natural-language processing. It has completely replaced SVMs and decision trees in a wide range of applications. For instance, for several years, the European Organization for Nuclear Research, CERN, used decision tree–based methods for analysis of particle data from the ATLAS detector at the Large Hadron Collider (LHC); but CERN eventually switched to Keras-based deep neural networks due to their higher performance and ease of training on large datasets.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761805262092

#deep-learning

The primary reason deep learning took off so quickly is that it offered better perfor- mance on many problems. But that’s not the only reason. Deep learning also makes problem-solving much easier, because it completely automates what used to be the most crucial step in a machine-learning workflow: feature engineering.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761807621388

#SVM #deep-learning #feature-engineering #shallow-learning

Previous machine-learning techniques—shallow learning—only involved transforming the input data into one or two successive representation spaces, usually via simple transformations such as high-dimensional non-linear projections ( SVMs) or decision trees. But the refined representations required by complex problems generally can’t be attained by such techniques. As such, humans had to go to great lengths to make the initial input data more amenable to processing by these methods: they had to manually engineer good layers of representations for their data. This is called feature engineering. Deep learning, on the other hand, completely automates this step: with deep learning, you learn all features in one pass rather than having to engineer them yourself. This has greatly simplified machine-learning workflows, often replacing sophisticated multistage pipelines with a single, simple, end-to-end deep-learning model.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761809980684

#deep-learning #essential-characteristics

These are the two essential characteristics of how deep learning learns from data: the incremental, layer-by-layer way in which increasingly complex representations are developed, and the fact that these intermediate incremental representations are learned jointly, each layer being updated to follow both the representational needs of the layer above and the needs of the layer below. Together, these two properties have made deep learning vastly more successful than previous approaches to machine learning.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761812339980

#Kaggle #applied-machine-learning #deep-learning #gradient-boosting-machines

These are the two techniques you should be the most familiar with in order to be successful in applied machine learning today: gradient boosting machines, for shallow- learning problems; and deep learning, for perceptual problems. In technical terms, this means you’ll need to be familiar with XGBoost and Keras—the two libraries that currently dominate Kaggle competitions. With this book in hand, you’re already one big step closer.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761814699276

#deep-learning #technical-forces

In general, three technical forces are driving advances in machine learning:

Hardware
Datasets and benchmarks
Algorithmic advances

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761817058572

#algorithmic-advances #deep-learning #machine-learning

Because the field is guided by experimental findings rather than by theory, algorithmic advances only become possible when appropriate data and hardware are available to try new ideas (or scale up old ideas, as is often the case). Machine learning isn’t mathematics or physics, where major advances can be done with a pen and a piece of paper. It’s an engineering science.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761819417868

#deep-learning

The real bottlenecks throughout the 1990s and 2000s were data and hardware. But here’s what happened during that time: the internet took off, and high-performance graphics chips were developed for the needs of the gaming market.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761820990732

#GPU #deep-learning

Throughout the 2000s, companies like NVIDIA and AMD have been investing billions of dollars in developing fast, massively parallel chips (graphical processing units [ GPUs]) to power the graphics of increasingly photorealistic video games

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761823350028

#CUDA #NVIDIA #deep-learning

In 2007, NVIDIA launched CUDA (https://developer.nvidia.com/about-cuda), a pro- gramming interface for its line of GPUs

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761828330764

#CUDA #deep-learning

Deep neural networks, consisting mostly of many small matrix multiplications, are also highly parallelizable; and around 2011, some researchers began to write CUDA implementations of neural nets

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761830690060

#Google #TPU #deep-learning

The deep-learning industry is starting to go beyond GPUs and is investing in increasingly specialized, efficient chips for deep learning. In 2016, at its annual I/O convention, Google revealed its tensor processing unit (TPU) project: a new chip design developed from the ground up to run deep neural networks, which is reportedly 10 times faster and far more energy efficient than top-of-the-line GPUs.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761833049356

#dataset #deep-learning #internet

When it comes to data, in addition to the exponential progress in storage hardware over the past 20 years (following Moore’s law), the game changer has been the rise of the internet, making it feasible to collect and distribute very large datasets for machine learning.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761835408652

#deep-learning #internet

User-generated image tags on Flickr, for instance, have been a treasure trove of data for computer vision. So are YouTube videos. And Wikipedia is a key dataset for natural-language processing.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761837767948

#ImageNet #deep-learning

If there’s one dataset that has been a catalyst for the rise of deep learning, it’s the ImageNet dataset, consisting of 1.4 million images that have been hand annotated with 1,000 image categories (1 category per image).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761840127244

#deep-learning

Around 2009–2010 (...) [there was] (...) the advent of several simple but important algorithmic improvements that allowed for better gradient propagation:

Better activation functions for neural layers
Better weight-initialization schemes, starting with layer-wise pretraining, which was quickly abandoned
Better optimization schemes, such as RMSProp and Adam

Only when these improvements began to allow for training models with 10 or more layers did deep learning start to shine.

Finally, in 2014, 2015, and 2016, even more advanced ways to help gradient propagation were discovered, such as batch normalization, residual connections, and depth-wise separable convolutions. Today we can train from scratch models that are thousands of layers deep.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761842486540

#deep-learning #democratization

One of the key factors driving this inflow of new faces in deep learning has been the democratization of the toolsets used in the field. In the early days, doing deep learning required significant C++ and CUDA expertise, which few people possessed. Nowadays, basic Python scripting skills suffice to do advanced deep-learning research. This has been driven most notably by the development of Theano and then TensorFlow—two symbolic tensor-manipulation frameworks for Python that support autodifferentiation, greatly simplifying the implementation of new models—and by the rise of user-friendly libraries such as Keras, which makes deep learning as easy as manipulating LEGO bricks. After its release in early 2015, Keras quickly became the go-to deep-learning solution for large numbers of new startups, graduate students, and researchers pivoting into the field.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761844845836

#deep-learning #properties #revolution

Deep learning has several properties that justify its status as an AI revolution, and it’s here to stay. We may not be using neural networks two decades from now, but whatever we use will directly inherit from modern deep learning and its core concepts. These important properties can be broadly sorted into three categories:

Simplicity—Deep learning removes the need for feature engineering, replacing complex, brittle, engineering-heavy pipelines with simple, end-to-end trainable models that are typically built using only five or six different tensor operations.
Scalability—Deep learning is highly amenable to parallelization on GPUs or TPUs, so it can take full advantage of Moore’s law. In addition, deep-learning models are trained by iterating over small batches of data, allowing them to be trained on datasets of arbitrary size. (The only bottleneck is the amount of parallel computational power available, which, thanks to Moore’s law, is a fast- moving barrier.)
Versatility and reusability—Unlike many prior machine-learning approaches, deep-learning models can be trained on additional data without restarting from scratch, making them viable for continuous online learning—an important property for very large production models. Furthermore, trained deep-learning models are repurposable and thus reusable: for instance, it’s possible to take a deep-learning model trained for image classification and drop it into a video- processing pipeline. This allows us to reinvest previous work into increasingly complex and powerful models. This also makes deep learning applicable to fairly small datasets.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761850350860

#Clinique #EBM #Médecine #Sémiologie

Pre-test probability is the probability of disease (i.e., prevalence) before application of the results of a physical finding. Pre-test probability is the starting point for all clinical decisions. For example, the clinician may know that a certain physical finding increases the probability of disease 40%, but this information alone is unhelpful unless the clini- cian also knows the starting point: if the pre-test probability for the particular diagnosis was 50%, the finding is diagnostic (i.e., post-test probability 50% + 40% = 90%); if the pre-test probability was only 10%, the finding is less helpful, because the probability of disease is still akin to a coin toss (i.e., post-test probability 10% + 40% = 50%).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761851923724

#Clinique #EBM #Médecine #Sémiologie

Likelihood ratios (LRs) are nothing more than diagnostic weights, numbers that quickly convey to clinicians how much a physical sign argues for or against disease. • LRs have possible values between 0 and ∞. Values greater than 1 increase the probability of disease. (The greater the value of the LR, the greater the increase in probability.) LRs less than 1 decrease the probability of disease. (The closer the number is to zero, the more the probability of disease decreases.) LRs that equal 1 do not change the probability of disease at all. • LRs of 2, 5, and 10 increase the probability of disease about 15%, 30%, and 45%, respectively (in absolute terms). LRs of 0.5, 0.2, and 0.1 (i.e., the recip- rocals of 2, 5, and 10) decrease probability 15%, 30%, and 45%, respectively.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4761932664076

The epoch is bracketed by two major events in Earth's history.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Paleocene - Wikipedia
n the modern Cenozoic Era . The name is a combination of the Ancient Greek palæo- meaning "old" and the Eocene Epoch (which succeeds the Paleocene), translating to "the old part of the Eocene". The epoch is bracketed by two major events in Earth's history. The K-Pg extinction event , brought on by an asteroid impact and volcanism, marked the beginning of the Paleocene and killed off 75% of living species, most famously the non-avian dinos

Annotation 4762908101900

#MLBook

Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”), produces the desired outputs. This mathematical formula also generates the correct outputs for most other inputs (distinct from the training data) on the condition that those inputs come from the same or a similar statistical distribution as the one the training data was drawn from.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762911509772

#MLBook #name-origin

So why the name “machine learning” then? The reason, as is often the case, is marketing: Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term in 1959 while at IBM. Similarly to how in the 2010s IBM tried to market the term “cognitive computing” to stand out from competition, in the 1960s, IBM used the new cool term “machine learning” to attract both clients and talented employees.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762913869068

#MLBook #definition #machine-learning

machine learning is a universally recognized term that usually refers to the science and engineering of building machines capable of doing various useful things without being explicitly programmed to do so.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762916228364

#MLBook #brainstorming #machine-learning

The book also comes in handy when brainstorming at the beginning of a project, when you try to answer the question whether a given technical or business problem is “machine-learnable” and, if yes, which techniques you should try to solve it.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762918849804

#MLBook #data-origin #machine-learning

Machine learning is a subfield of computer science that is concerned with building algorithms which, to be useful, rely on a collection of examples of some phenomenon. These examples can come from nature, be handcrafted by humans or generated by another algorithm.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762921209100

#MLBook #machine-learning #types

Learning can be supervised, semi-supervised, unsupervised and reinforcement.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4762923568396

#MLBook #classes #dataset #feature-vector #label #labeled-examples #machine-learning #supervised-learning

In supervised learning , the dataset is the collection of labeled examples \({(\mathbf x_i , y_i)}^N_{i=1}\) . Each element \(\mathbf x_i\) i among \(N\) is called a feature vector . A feature vector is a vector in which each dimension \(j = 1 , . . . , D\) contains a value that describes the example somehow. That value is called a feature and is denoted as \(x^{(j)}\) . For instance, if each example \(\mathbf x\) in our collection represents a person, then the first feature, \(x^{(1)}\) , could contain height in cm, the second feature, \(x^{(2)}\) , could contain weight in kg, \(x^{(3)}\) could contain gender, and so on. For all examples in the dataset, the feature at position \(j\) in the feature vector always contains the same kind of information. It means that if \(x^{(2)}_i\) contains weight in kg in some example \(\mathbf x_i\) , then \(x^{(2)}_k\) will also contain weight in kg in every example \(\mathbf x_k , k = 1 , . . . , N\) . The label \(y_i\) can be either an element belonging to a finite set of classes \(\{1 , 2 , . . . , C\}\) , or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph. Unless otherwise stated, in this book \(y_i\) is either one of a finite set of classes or a real number . You can see a class as a category to which an example belongs. For instance, if your examples are email messages and your problem is spam detection, then you have two classes \(\{spam, not\_spam\}\).

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Edited, memorised or added to reading queue

on 07-Jan-2020 (Tue)

pdf

Parent (intermediate) annotation

Original toplevel document (pdf)

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf

pdf