Do you want BuboFlash to help you learning these things? Click here to log in or create user.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

Module-like[show] Module Group with operators Vector space Linear algebra Algebra-like[show] Algebra Associative Non-associative Composition algebra Lie algebra Graded Bialgebra v t e <span>In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. The best known fields are the field of rational

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers.

Module-like[show] Module Group with operators Vector space Linear algebra Algebra-like[show] Algebra Associative Non-associative Composition algebra Lie algebra Graded Bialgebra v t e <span>In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. The best known fields are the field of rational

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers.

Module-like[show] Module Group with operators Vector space Linear algebra Algebra-like[show] Algebra Associative Non-associative Composition algebra Lie algebra Graded Bialgebra v t e <span>In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. The best known fields are the field of rational

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

mathematical optimization selects a best element (with regard to some criterion) from some set of available alternatives.

+ 4. The global maximum at (x, y, z) = (0, 0, 4) is indicated by a blue dot. [imagelink] Nelder-Mead minimum search of Simionescu's function. Simplex vertices are ordered by their value, with 1 having the lowest (best) value. <span>In mathematics, computer science and operations research, mathematical optimization or mathematical programming, alternatively spelled optimisation, is the selection of a best element (with regard to some criterion) from some set of available alternatives. [1] In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers.

Module-like[show] Module Group with operators Vector space Linear algebra Algebra-like[show] Algebra Associative Non-associative Composition algebra Lie algebra Graded Bialgebra v t e <span>In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. The best known fields are the field of rational

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ltiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. <span>The best known fields are the field of rational numbers and the field of real numbers. The field of complex numbers is also widely used, not only in mathematics, but also in many areas of science and engineering. Many other fields, such as fields of rational functions, al

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The best known fields are the field of rational numbers and the field of real numbers.

ltiplication, and division are defined, and behave as when they are applied to rational and real numbers. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. <span>The best known fields are the field of rational numbers and the field of real numbers. The field of complex numbers is also widely used, not only in mathematics, but also in many areas of science and engineering. Many other fields, such as fields of rational functions, al

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Plausible reasoning aims to develope general, consistent, and unambiguous principless for inference

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

All search commands can be followed , (comma) to go the the previous searched item

er the cursor F [char] - Move to the next char on the current line before the cursor t [char] - Move to before the next char on the current line after the cursor T [char] - Move to before the next char on the current line before the cursor <span>All these commands can be followed by ; (semicolon) to go to the next searched item, and , (comma) to go the the previous searched item ##Insert/Appending/Editing Text Results in insert mode i - start insert mode at cursor I - insert at the beginning of the line a - append after the cursor A -

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

One of the most powerful means of validating a statistical algorithm is to verify that you can recover the ground truth from simulated data.

m used in Stan. All of these criteria are necessary but not sufficient conditions for a good fit -- in other words they all identify problems that will ensure a bad fit but none of them can guarantee a good fit. Recover simulated values <span>One of the most powerful means of validating a statistical algorithm is to verify that you can recover the ground truth from simulated data. Begin by selecting reasonable "true" values for each of your parameters, simulating data according to your model, and then trying to fit your model with the simulated data.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Well, first make sure your $PATH variable is doing what you want it to. You likely have a startup script called something like ~/.bash_profile or ~/.bashrc that sets this $PATH variable.

s that the packages you can import when running python are entirely separate from the packages you can import when running ipython or a Jupyter notebook: you're using two completely independent Python installations. So how to fix this? <span>Well, first make sure your $PATH variable is doing what you want it to. You likely have a startup script called something like ~/.bash_profile or ~/.bashrc that sets this $PATH variable. On Windows, you can modify the user specific environment variables. You can manually modify that if you want your system to search things in a different order. When you first install an

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

a bijective function a one-to-one and onto (surjective) mapping of a set X to a set Y.

nction between the elements of two sets, where each element of one set is paired with exactly one element of the other set, and each element of the other set is paired with exactly one element of the first set. There are no unpaired elements. <span>In mathematical terms, a bijective function f: X → Y is a one-to-one (injective) and onto (surjective) mapping of a set X to a set Y. A bijection from the set X to the set Y has an inverse function from Y to X. If X and Y are finite sets, then the existence of a bijection means they have the same number of elements.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

the relative complement of A in B is the set of elements in B but not in A . The relative complement of \({\displaystyle B\cap A^{\complement }=B\setminus A}\)

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

{\displaystyle {\mathcal {F}}\,} is called a probability measure if P ( Ω ) = 1. {\displaystyle P(\Omega )=1.\,} <span>If F {\displaystyle {\mathcal {F}}\,} is the Borel σ-algebra on the set of real numbers, then there is a unique probability measure on F {\displaystyle {\mathcal {F}}\,} for any cdf, and vice versa. The measure corresponding to a cdf is said to be induced by the cdf. This measure coincides with the pmf for discrete variables and pdf for continuous variables, making the measure-theo

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

If \({\mathcal {F}}\,\) is the Borel σ-algebra on the set of real numbers, then there is a unique probability measure on \({\mathcal {F}}\,\) for any cdf, and vice versa.

{\displaystyle {\mathcal {F}}\,} is called a probability measure if P ( Ω ) = 1. {\displaystyle P(\Omega )=1.\,} <span>If F {\displaystyle {\mathcal {F}}\,} is the Borel σ-algebra on the set of real numbers, then there is a unique probability measure on F {\displaystyle {\mathcal {F}}\,} for any cdf, and vice versa. The measure corresponding to a cdf is said to be induced by the cdf. This measure coincides with the pmf for discrete variables and pdf for continuous variables, making the measure-theo

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Segundo Platão/Sócrates, todo o ato intelectual só pode ter lugar na alma

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

or everyone to integrate changes regularly and avoid having merge conflicts. Having few large commits and sharing them rarely, in contrast, makes it hard both to solve conflicts and to comprehend what happened. Don’t Commit Half-Done Work <span>You should only commit code when it’s completed. This doesn’t mean you have to complete a whole, large feature before committing. Quite the contrary: split the feature’s implementation into logical chunks and remember to commit early

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ry before leaving the office at the end of the day. If you’re tempted to commit just because you need a clean working copy (to check out a branch, pull in changes, etc.) consider using Git’s “Stash” feature instead. Test Before You Commit <span>Resist the temptation to commit something that you “think” is completed. Test it thoroughly to make sure it really is completed and has no side effects (as far as one can tell). While committing half-baked things in your local repository only requires you to forgive yourself, having your code tested is even more important when it comes

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

. Separate it from the following body by including a blank line. The body of your message should provide detailed answers to the following questions: What was the motivation for the change? How does it differ from the previous implementation? <span>Use the imperative, present tense („change“, not „changed“ or „changes“) to be consistent with generated messages from commands like git merge. Version Control is not a Backup System Having your files backed up on a remote server is a nice side effect of having a version control system. But you should not use your VCS like

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

n it comes to pushing / sharing your code with others. Write Good Commit Messages Begin your message with a short summary of your changes (up to 50 characters as a guideline). Separate it from the following body by including a blank line. <span>The body of your message should provide detailed answers to the following questions: What was the motivation for the change? How does it differ from the previous implementation? Use the imperative, present tense („change“, not „changed“ or „changes“) to be consistent with generated messages from commands like git merge. Version Control is not a Backup System

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

mitting semantically (see “related changes”) – you shouldn’t just cram in files. Use Branches Branching is one of Git’s most powerful features – and this is not by accident: quick and easy branching was a central requirement from day one. <span>Branches are the perfect tool to help you avoid mixing up different lines of development. You should use branches extensively in your development workflows: for new features, bug fixes, experiments, ideas… Agree on a Workflow Git lets you pick from a lot of different wo

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

Branches are the perfect tool to help you avoid mixing up different lines of development

mitting semantically (see “related changes”) – you shouldn’t just cram in files. Use Branches Branching is one of Git’s most powerful features – and this is not by accident: quick and easy branching was a central requirement from day one. <span>Branches are the perfect tool to help you avoid mixing up different lines of development. You should use branches extensively in your development workflows: for new features, bug fixes, experiments, ideas… Agree on a Workflow Git lets you pick from a lot of different wo

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

writing commit messages with the imperative, present tense to be consistent with generated messages from commands like git merge.

. Separate it from the following body by including a blank line. The body of your message should provide detailed answers to the following questions: What was the motivation for the change? How does it differ from the previous implementation? <span>Use the imperative, present tense („change“, not „changed“ or „changes“) to be consistent with generated messages from commands like git merge. Version Control is not a Backup System Having your files backed up on a remote server is a nice side effect of having a version control system. But you should not use your VCS like

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The body of your commit message should provide motivation to and difference of the commit.

n it comes to pushing / sharing your code with others. Write Good Commit Messages Begin your message with a short summary of your changes (up to 50 characters as a guideline). Separate it from the following body by including a blank line. <span>The body of your message should provide detailed answers to the following questions: What was the motivation for the change? How does it differ from the previous implementation? Use the imperative, present tense („change“, not „changed“ or „changes“) to be consistent with generated messages from commands like git merge. Version Control is not a Backup System

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Resist the temptation to commit something that you “think” is completed. Test it thoroughly to make sure it really is completed and has no side effects

ry before leaving the office at the end of the day. If you’re tempted to commit just because you need a clean working copy (to check out a branch, pull in changes, etc.) consider using Git’s “Stash” feature instead. Test Before You Commit <span>Resist the temptation to commit something that you “think” is completed. Test it thoroughly to make sure it really is completed and has no side effects (as far as one can tell). While committing half-baked things in your local repository only requires you to forgive yourself, having your code tested is even more important when it comes

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Resist the temptation to commit something that you “think” is completed. Test it thoroughly to make sure it really is completed and has no side effects

ry before leaving the office at the end of the day. If you’re tempted to commit just because you need a clean working copy (to check out a branch, pull in changes, etc.) consider using Git’s “Stash” feature instead. Test Before You Commit <span>Resist the temptation to commit something that you “think” is completed. Test it thoroughly to make sure it really is completed and has no side effects (as far as one can tell). While committing half-baked things in your local repository only requires you to forgive yourself, having your code tested is even more important when it comes

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

You should only commit code when it’s completed.

or everyone to integrate changes regularly and avoid having merge conflicts. Having few large commits and sharing them rarely, in contrast, makes it hard both to solve conflicts and to comprehend what happened. Don’t Commit Half-Done Work <span>You should only commit code when it’s completed. This doesn’t mean you have to complete a whole, large feature before committing. Quite the contrary: split the feature’s implementation into logical chunks and remember to commit early

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ery probability measure on a standard Borel space turns it into a standard probability space. Non-Borel sets[edit] An example of a subset of the reals which is non-Borel, due to Lusin [4] (see Sect. 62, pages 76–78), is described below. <span>In contrast, an example of a non-measurable set cannot be exhibited, though its existence can be proved. Every irrational number has a unique representation by an infinite continued fraction x = a 0

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In contrast to Borel sets, an example of a non-measurable set cannot be exhibited, though its existence can be proved.

ery probability measure on a standard Borel space turns it into a standard probability space. Non-Borel sets[edit] An example of a subset of the reals which is non-Borel, due to Lusin [4] (see Sect. 62, pages 76–78), is described below. <span>In contrast, an example of a non-measurable set cannot be exhibited, though its existence can be proved. Every irrational number has a unique representation by an infinite continued fraction x = a 0

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

R n {\displaystyle \mathbb {R} ^{n}} and other continuous sample spaces. Measure-theoretic probability theory[edit] <span>The raison d'être of the measure-theoretic treatment of probability is that it unifies the discrete and the continuous cases, and makes the difference a question of which measure is used. Furthermore, it covers distributions that are neither discrete nor continuous nor mixtures of the two. An example of such distributions could be a mix of discrete and continuous distr

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

rk on probabilities outside R n {\displaystyle \mathbb {R} ^{n}} , as in the theory of stochastic processes. For example, <span>to study Brownian motion, probability is defined on a space of functions. When it's convenient to work with a dominating measure, the Radon-Nikodym theorem is used to define a density as the Radon-Nikodym derivative of the probability distribution of intere

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

{\displaystyle \mu _{F}\,} induced by F . {\displaystyle F\,.} Along with providing better understanding and unification of discrete and continuous probabilities, <span>measure-theoretic treatment also allows us to work on probabilities outside R n {\displaystyle \mathbb {R} ^{n}} , as in the theory of stochastic processes. For example, to study Brownian motion, probability is defined on a space of functions. When it's convenient to work with a dominating measure, the Radon-Nikodym theorem is used to def

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

F {\displaystyle F\,} of X {\displaystyle X\,} , wherever F {\displaystyle F\,} is continuous. <span>Weak convergence is also called convergence in distribution. Most common shorthand notation: X n → D X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathcal {D}}}\,X} Convergence in probability The sequence of random variables X 1 , X

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

D X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathcal {D}}}\,X} Convergence in probability <span>The sequence of random variables X 1 , X 2 , … {\displaystyle X_{1},X_{2},\dots \,} is said to converge towards the random variable X {\displaystyle X\,} in probability if lim n → ∞ P ( | X n − X | ≥ ε ) = 0 {\displaystyle \lim _{n\rightarrow \infty }P\left(\left|X_{n}-X\right|\geq \varepsilon \right)=0} for every ε > 0. Most common shorthand notation: X n → P X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X} Strong convergence The sequence of random variables X 1 , X

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

→ P X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X} Strong convergence <span>The sequence of random variables X 1 , X 2 , … {\displaystyle X_{1},X_{2},\dots \,} is said to converge towards the random variable X {\displaystyle X\,} strongly if P ( lim n → ∞ X n = X ) = 1 {\displaystyle P(\lim _{n\rightarrow \infty }X_{n}=X)=1} . Strong convergence is also known as almost sure convergence. Most common shorthand notation: X n → a . s . X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathrm {a.s.} }}\,X} As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies convergence in probability, and convergence in probability implies w

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Let \((\Omega ,{\mathcal {F}},P)\) be a probability space and \((E,{\mathcal {E}})\) a measurable space. Then an \((E,{\mathcal {E}})\)-valued random variable is a measurable function \(X\colon \Omega \to E\), which means that, for every subset \(B\in {\mathcal {E}}\), its preimage \(X^{-1}(B)\in {\mathcal {F}}\) where \(X^{-1}(B)=\{\omega :X

fined over any sets that can be derived either directly from continuous intervals of numbers or by a finite or countably infinite number of unions and/or intersections of such intervals. [2] The measure-theoretic definition is as follows. <span>Let ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},P)} be a probability space and ( E , E ) {\displaystyle (E,{\mathcal {E}})} a measurable space. Then an ( E , E ) {\displaystyle (E,{\mathcal {E}})} -valued random variable is a measurable function X : Ω → E {\displaystyle X\colon \Omega \to E} , which means that, for every subset B ∈ E {\displaystyle B\in {\mathcal {E}}} , its preimage X − 1 ( B ) ∈ F {\displaystyle X^{-1}(B)\in {\mathcal {F}}} where X − 1 ( B ) = { ω : X ( ω ) ∈ B } {\displaystyle X^{-1}(B)=\{\omega :X(\omega )\in B\}} . [5] This definition enables us to measure any subset B ∈ E {\displaystyle B\in {\mathcal {E}}} in the target space by looking at its preimage, which by assumption is measurable. In more intuitive terms, a member of Ω {\displaystyle \Omega } is a possible outcome, a member of

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Let \((\Omega ,{\mathcal {F}},P)\) be a probability space and \((E,{\mathcal {E}})\) a measurable space. Then an \((E,{\mathcal {E}})\)-valued random variable is a measurable function \(X\colon \Omega \to E\), which means that, for every subset \(B\in {\mathcal {E}}\), its preimage \(X^{-1}(B)\in {\mathcal {F}}\) where \(X^{-1}(B)=\{\omega :X(\omega )\in B\}\). [5] This definition enables u

fined over any sets that can be derived either directly from continuous intervals of numbers or by a finite or countably infinite number of unions and/or intersections of such intervals. [2] The measure-theoretic definition is as follows. <span>Let ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},P)} be a probability space and ( E , E ) {\displaystyle (E,{\mathcal {E}})} a measurable space. Then an ( E , E ) {\displaystyle (E,{\mathcal {E}})} -valued random variable is a measurable function X : Ω → E {\displaystyle X\colon \Omega \to E} , which means that, for every subset B ∈ E {\displaystyle B\in {\mathcal {E}}} , its preimage X − 1 ( B ) ∈ F {\displaystyle X^{-1}(B)\in {\mathcal {F}}} where X − 1 ( B ) = { ω : X ( ω ) ∈ B } {\displaystyle X^{-1}(B)=\{\omega :X(\omega )\in B\}} . [5] This definition enables us to measure any subset B ∈ E {\displaystyle B\in {\mathcal {E}}} in the target space by looking at its preimage, which by assumption is measurable. In more intuitive terms, a member of Ω {\displaystyle \Omega } is a possible outcome, a member of

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

dy>Let \((\Omega ,{\mathcal {F}},P)\) be a probability space and \((E,{\mathcal {E}})\) a measurable space. Then an \((E,{\mathcal {E}})\)-valued random variable is a measurable function \(X\colon \Omega \to E\), which means that, for every subset \(B\in {\mathcal {E}}\), its preimage \(X^{-1}(B)\in {\mathcal {F}}\) where \(X^{-1}(B)=\{\omega :X(\omega )\in B\}\). [5] This definition enables us to measure any subset \(B\in {\mathcal {E}}\) in the target space by looking at its preimage, which by assumption is measurable. <body><

fined over any sets that can be derived either directly from continuous intervals of numbers or by a finite or countably infinite number of unions and/or intersections of such intervals. [2] The measure-theoretic definition is as follows. <span>Let ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},P)} be a probability space and ( E , E ) {\displaystyle (E,{\mathcal {E}})} a measurable space. Then an ( E , E ) {\displaystyle (E,{\mathcal {E}})} -valued random variable is a measurable function X : Ω → E {\displaystyle X\colon \Omega \to E} , which means that, for every subset B ∈ E {\displaystyle B\in {\mathcal {E}}} , its preimage X − 1 ( B ) ∈ F {\displaystyle X^{-1}(B)\in {\mathcal {F}}} where X − 1 ( B ) = { ω : X ( ω ) ∈ B } {\displaystyle X^{-1}(B)=\{\omega :X(\omega )\in B\}} . [5] This definition enables us to measure any subset B ∈ E {\displaystyle B\in {\mathcal {E}}} in the target space by looking at its preimage, which by assumption is measurable. In more intuitive terms, a member of Ω {\displaystyle \Omega } is a possible outcome, a member of

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Weak convergence is also called convergence in distribution. Most common shorthand notation: \({\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathcal {D}}}\,X}\)

F {\displaystyle F\,} of X {\displaystyle X\,} , wherever F {\displaystyle F\,} is continuous. <span>Weak convergence is also called convergence in distribution. Most common shorthand notation: X n → D X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathcal {D}}}\,X} Convergence in probability The sequence of random variables X 1 , X

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The sequence of random variables \(X_{1},X_{2},\dots \,\) is said to converge towards the random variable \(X\,\) in probability if \(\lim _{n\rightarrow \infty }P\left(\left|X_{n}-X\right|\geq \varepsilon \right)=0\) for every ε > 0. Most common shorthand notation: \({\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X}\)

D X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathcal {D}}}\,X} Convergence in probability <span>The sequence of random variables X 1 , X 2 , … {\displaystyle X_{1},X_{2},\dots \,} is said to converge towards the random variable X {\displaystyle X\,} in probability if lim n → ∞ P ( | X n − X | ≥ ε ) = 0 {\displaystyle \lim _{n\rightarrow \infty }P\left(\left|X_{n}-X\right|\geq \varepsilon \right)=0} for every ε > 0. Most common shorthand notation: X n → P X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X} Strong convergence The sequence of random variables X 1 , X

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The sequence of random variables \(X_{1},X_{2},\dots \,\) is said to converge towards the random variable \(X\,\) strongly if \(P(\lim _{n\rightarrow \infty }X_{n}=X)=1\). Strong convergence is also known as almost sure convergence. Most common shorthand notation: \({\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathrm {a.s.} }}\,X}\)

→ P X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X} Strong convergence <span>The sequence of random variables X 1 , X 2 , … {\displaystyle X_{1},X_{2},\dots \,} is said to converge towards the random variable X {\displaystyle X\,} strongly if P ( lim n → ∞ X n = X ) = 1 {\displaystyle P(\lim _{n\rightarrow \infty }X_{n}=X)=1} . Strong convergence is also known as almost sure convergence. Most common shorthand notation: X n → a . s . X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathrm {a.s.} }}\,X} As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies convergence in probability, and convergence in probability implies w

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

tml>The sequence of random variables \(X_{1},X_{2},\dots \,\) is said to converge towards the random variable \(X\,\) strongly if \(P(\lim _{n\rightarrow \infty }X_{n}=X)=1\). Strong convergence is also known as almost sure convergence. Most common shorthand notation: \({\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathrm {a.s.} }}\,X}\) <html>

→ P X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {P}}\,X} Strong convergence <span>The sequence of random variables X 1 , X 2 , … {\displaystyle X_{1},X_{2},\dots \,} is said to converge towards the random variable X {\displaystyle X\,} strongly if P ( lim n → ∞ X n = X ) = 1 {\displaystyle P(\lim _{n\rightarrow \infty }X_{n}=X)=1} . Strong convergence is also known as almost sure convergence. Most common shorthand notation: X n → a . s . X {\displaystyle \displaystyle X_{n}\,{\xrightarrow {\mathrm {a.s.} }}\,X} As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies convergence in probability, and convergence in probability implies w

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

to study Brownian motion, probability is defined on a space of functions.

rk on probabilities outside R n {\displaystyle \mathbb {R} ^{n}} , as in the theory of stochastic processes. For example, <span>to study Brownian motion, probability is defined on a space of functions. When it's convenient to work with a dominating measure, the Radon-Nikodym theorem is used to define a density as the Radon-Nikodym derivative of the probability distribution of intere

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

measure-theoretic treatment also allows us to work on probabilities outside \(\mathbb {R} ^{n}\), as in the theory of stochastic processes .

{\displaystyle \mu _{F}\,} induced by F . {\displaystyle F\,.} Along with providing better understanding and unification of discrete and continuous probabilities, <span>measure-theoretic treatment also allows us to work on probabilities outside R n {\displaystyle \mathbb {R} ^{n}} , as in the theory of stochastic processes. For example, to study Brownian motion, probability is defined on a space of functions. When it's convenient to work with a dominating measure, the Radon-Nikodym theorem is used to def

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The raison d'être of the measure-theoretic treatment of probability is that it unifies the discrete and the continuous cases, and makes the difference a question of which measure is used.

R n {\displaystyle \mathbb {R} ^{n}} and other continuous sample spaces. Measure-theoretic probability theory[edit] <span>The raison d'être of the measure-theoretic treatment of probability is that it unifies the discrete and the continuous cases, and makes the difference a question of which measure is used. Furthermore, it covers distributions that are neither discrete nor continuous nor mixtures of the two. An example of such distributions could be a mix of discrete and continuous distr

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

You should now be able to press [space]w in normal mode to save a file. [space]p should paste from the system clipboard (outside of Vim). If you can’t paste, it’s probably because Vim was not built with the system clipboard option. <span>To check, run vim --version and see if +clipboard exists. If it says -clipboard , you will not be able to copy from outside of Vim. For Mac users, homebrew install Vim with the clipboard option. Install homebrew and then run brew install vim

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

To check, run vim --version and see if +clipboard exists.

You should now be able to press [space]w in normal mode to save a file. [space]p should paste from the system clipboard (outside of Vim). If you can’t paste, it’s probably because Vim was not built with the system clipboard option. <span>To check, run vim --version and see if +clipboard exists. If it says -clipboard , you will not be able to copy from outside of Vim. For Mac users, homebrew install Vim with the clipboard option. Install homebrew and then run brew install vim

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

in the foreground or bg in the background DELETE # deletes one character backward !! # repeats the last command exit # logs out of current session # 1. Bash Basics. export # displays all environment variables <span>echo $SHELL # displays the shell you're using echo $BASH_VERSION # displays bash version bash # if you want to use bash (type exit to go back to your normal shell) whereis bash # finds out where bash is on

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

echo $SHELL # displays the shell you're using

in the foreground or bg in the background DELETE # deletes one character backward !! # repeats the last command exit # logs out of current session # 1. Bash Basics. export # displays all environment variables <span>echo $SHELL # displays the shell you're using echo $BASH_VERSION # displays bash version bash # if you want to use bash (type exit to go back to your normal shell) whereis bash # finds out where bash is on

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

n—if you've got thoughts, please contribute or share them. Data is immutable Don't ever edit your raw data, especially not manually, and especially not in Excel. Don't overwrite your raw data. Don't save multiple versions of the raw data. <span>Treat the data (and its format) as immutable. The code you write should move the raw data through a pipeline to your final analysis. You shouldn't have to run all of the steps every time you want to make a new figure (see Analysis

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

new figure (see Analysis is a DAG), but anyone should be able to reproduce the final products with only the code in src and the data in data/raw . Also, if data is immutable, it doesn't need source control in the same way that code does. <span>Therefore, by default, the data folder is included in the .gitignore file. If you have a small amount of data that rarely changes, you may want to include the data in the repository. Github currently warns if files are over 50MB and rejects files over 100MB.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

unication Notebook packages like the Jupyter notebook, Beaker notebook, Zeppelin, and other literate programming tools are very effective for exploratory data analysis. However, these tools can be less effective for reproducing an analysis. <span>When we use notebooks in our work, we often subdivide the notebooks folder. For example, notebooks/exploratory contains initial explorations, whereas notebooks/reports is more polished work that can be exported as html to the reports directory. Since no

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

for storing/syncing large data include AWS S3 with a syncing tool (e.g., s3cmd ), Git Large File Storage, Git Annex, and dat. Currently by default, we ask for an S3 bucket and use AWS CLI to sync data in the data folder with the server. <span>Notebooks are for exploration and communication Notebook packages like the Jupyter notebook, Beaker notebook, Zeppelin, and other literate programming tools are very effective for exploratory data analysis. However, these tools can

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

control (e.g., diffs of the json are often not human-readable and merging is near impossible), we recommended not collaborating directly with others on Jupyter notebooks. There are two steps we recommend for using notebooks effectively: <span>Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb ). Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data pr

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

. There are two steps we recommend for using notebooks effectively: Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb ). <span>Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim . If it's useful utility code, refactor it to src . Now by default we turn the project into a Python package (see the setup.py file). You can import your code and use it in notebooks with a cell like the following: # OPTIONAL: Lo

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

rts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim . If it's useful utility code, refactor it to src . <span>Now by default we turn the project into a Python package (see the setup.py file). You can import your code and use it in notebooks with a cell like the following: # OPTIONAL: Load the "autoreload" extension so that code can change %load_ext autoreload # OPTIONAL: always reload modules so that as you change code in src, it gets loaded

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

an analysis you have long-running steps that preprocess data or train models. If these steps have been run already (and you have stored the output somewhere like the data/interim directory), you don't want to wait to rerun them every time. <span>We prefer make for managing steps that depend on each other, especially the long-running ones. Make is a common tool on Unix-based platforms (and is available for Windows). Following the make documentation, Makefile conventions, and portability guide will help ensure your Makef

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

re other tools for managing DAGs that are written in Python instead of a DSL (e.g., Paver, Luigi, Airflow, Snakemake, Ruffus, or Joblib). Feel free to use these if they are more appropriate for your analysis. Build from the environment up <span>The first step in reproducing an analysis is always reproducing the computational environment it was run in. You need the same tools, the same libraries, and the same versions to make everything play nicely together. One effective approach to this is use virtualenv (we recommend virtualenvwr

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

vironment it was run in. You need the same tools, the same libraries, and the same versions to make everything play nicely together. One effective approach to this is use virtualenv (we recommend virtualenvwrapper for managing virtualenvs). <span>By listing all of your requirements in the repository (we include a requirements.txt file) you can easily track the packages needed to recreate the analysis. Here is a good workflow: Run mkvirtualenv when creating a new project pip install the packages that your analysis needs Run pip freeze > requirements.txt to pin the exact pack

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

p secrets and configuration out of version control You really don't want to leak your AWS secret key or Postgres username and password on Github. Enough said — see the Twelve Factor App principles on this point. Here's one way to do this: <span>Store your secrets and config variables in a special file Create a .env file in the project root folder. Thanks to the .gitignore , this file should never get committed into the version control repository. Here's an example: # example .env file DATABASE_URL=postgres://username:password@localhost:5432/dbname AWS_ACCESS_KEY=myaccesskey AWS_SECRET_ACCESS_KEY=mysecretkey OTHER_VARIABLE=some

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

up vote 16 down vote I hope this isn't leading too far away from the posted question, <span>but setting the default editor and then using git commit -e might be much more comfortable. share|edit|flag edited Dec 22 '17 at 6:21

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

but setting the default editor and then using git commit - e might be much more comfortable.

up vote 16 down vote I hope this isn't leading too far away from the posted question, <span>but setting the default editor and then using git commit -e might be much more comfortable. share|edit|flag edited Dec 22 '17 at 6:21

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

to tasks where metadata are used, for example, discovery and reuse. Best Practice 1: Provide metadata Provide metadata for both human users and computer applications. Why <span>Providing metadata is a fundamental requirement when publishing data on the Web because data publishers and data consumers may be unknown to each other. Then, it is essential to provide information that helps human users and computer applications to understand the data as well as other important aspects that describes a dataset or a dis

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Providing metadata is a fundamental requirement when publishing data on the Web because data publishers and data consumers may be unknown to each other.

to tasks where metadata are used, for example, discovery and reuse. Best Practice 1: Provide metadata Provide metadata for both human users and computer applications. Why <span>Providing metadata is a fundamental requirement when publishing data on the Web because data publishers and data consumers may be unknown to each other. Then, it is essential to provide information that helps human users and computer applications to understand the data as well as other important aspects that describes a dataset or a dis

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Store your secrets and config variables in a special file Create a .env file in the project root folder. Thanks to the .gitignore , this file should never get committed into the version control repository.

p secrets and configuration out of version control You really don't want to leak your AWS secret key or Postgres username and password on Github. Enough said — see the Twelve Factor App principles on this point. Here's one way to do this: <span>Store your secrets and config variables in a special file Create a .env file in the project root folder. Thanks to the .gitignore , this file should never get committed into the version control repository. Here's an example: # example .env file DATABASE_URL=postgres://username:password@localhost:5432/dbname AWS_ACCESS_KEY=myaccesskey AWS_SECRET_ACCESS_KEY=mysecretkey OTHER_VARIABLE=some

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Store your secrets and config variables in a special file Create a .env file in the project root folder. Thanks to the .gitignore , this file should never get committed into the version control repository.

p secrets and configuration out of version control You really don't want to leak your AWS secret key or Postgres username and password on Github. Enough said — see the Twelve Factor App principles on this point. Here's one way to do this: <span>Store your secrets and config variables in a special file Create a .env file in the project root folder. Thanks to the .gitignore , this file should never get committed into the version control repository. Here's an example: # example .env file DATABASE_URL=postgres://username:password@localhost:5432/dbname AWS_ACCESS_KEY=myaccesskey AWS_SECRET_ACCESS_KEY=mysecretkey OTHER_VARIABLE=some

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

By listing all of your requirements in the repository (we include a requirements.txt file) you can easily track the packages needed to recreate the analysis.

vironment it was run in. You need the same tools, the same libraries, and the same versions to make everything play nicely together. One effective approach to this is use virtualenv (we recommend virtualenvwrapper for managing virtualenvs). <span>By listing all of your requirements in the repository (we include a requirements.txt file) you can easily track the packages needed to recreate the analysis. Here is a good workflow: Run mkvirtualenv when creating a new project pip install the packages that your analysis needs Run pip freeze > requirements.txt to pin the exact pack

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The first step in reproducing an analysis is always reproducing the computational environment it was run in.

re other tools for managing DAGs that are written in Python instead of a DSL (e.g., Paver, Luigi, Airflow, Snakemake, Ruffus, or Joblib). Feel free to use these if they are more appropriate for your analysis. Build from the environment up <span>The first step in reproducing an analysis is always reproducing the computational environment it was run in. You need the same tools, the same libraries, and the same versions to make everything play nicely together. One effective approach to this is use virtualenv (we recommend virtualenvwr

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

We prefer make for managing steps that depend on each other, especially the long-running ones.

an analysis you have long-running steps that preprocess data or train models. If these steps have been run already (and you have stored the output somewhere like the data/interim directory), you don't want to wait to rerun them every time. <span>We prefer make for managing steps that depend on each other, especially the long-running ones. Make is a common tool on Unix-based platforms (and is available for Windows). Following the make documentation, Makefile conventions, and portability guide will help ensure your Makef

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Now by default we turn the project into a Python package (see the setup.py file). You can import your code and use it in notebooks with a cell like the following:

rts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim . If it's useful utility code, refactor it to src . <span>Now by default we turn the project into a Python package (see the setup.py file). You can import your code and use it in notebooks with a cell like the following: # OPTIONAL: Load the "autoreload" extension so that code can change %load_ext autoreload # OPTIONAL: always reload modules so that as you change code in src, it gets loaded

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim . If it's useful utility code, refactor it to src . <span><body><html>

. There are two steps we recommend for using notebooks effectively: Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb ). <span>Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim . If it's useful utility code, refactor it to src . Now by default we turn the project into a Python package (see the setup.py file). You can import your code and use it in notebooks with a cell like the following: # OPTIONAL: Lo

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Follow a naming convention that shows the owner and the order the analysis was done in.

control (e.g., diffs of the json are often not human-readable and merging is near impossible), we recommended not collaborating directly with others on Jupyter notebooks. There are two steps we recommend for using notebooks effectively: <span>Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb ). Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data pr

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

When we use notebooks in our work, we often subdivide the notebooks folder.

unication Notebook packages like the Jupyter notebook, Beaker notebook, Zeppelin, and other literate programming tools are very effective for exploratory data analysis. However, these tools can be less effective for reproducing an analysis. <span>When we use notebooks in our work, we often subdivide the notebooks folder. For example, notebooks/exploratory contains initial explorations, whereas notebooks/reports is more polished work that can be exported as html to the reports directory. Since no

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Notebooks are for exploration and communication

for storing/syncing large data include AWS S3 with a syncing tool (e.g., s3cmd ), Git Large File Storage, Git Annex, and dat. Currently by default, we ask for an S3 bucket and use AWS CLI to sync data in the data folder with the server. <span>Notebooks are for exploration and communication Notebook packages like the Jupyter notebook, Beaker notebook, Zeppelin, and other literate programming tools are very effective for exploratory data analysis. However, these tools can

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Therefore, by default, the data folder is included in the .gitignore file.

new figure (see Analysis is a DAG), but anyone should be able to reproduce the final products with only the code in src and the data in data/raw . Also, if data is immutable, it doesn't need source control in the same way that code does. <span>Therefore, by default, the data folder is included in the .gitignore file. If you have a small amount of data that rarely changes, you may want to include the data in the repository. Github currently warns if files are over 50MB and rejects files over 100MB.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Treat the data (and its format) as immutable.

n—if you've got thoughts, please contribute or share them. Data is immutable Don't ever edit your raw data, especially not manually, and especially not in Excel. Don't overwrite your raw data. Don't save multiple versions of the raw data. <span>Treat the data (and its format) as immutable. The code you write should move the raw data through a pipeline to your final analysis. You shouldn't have to run all of the steps every time you want to make a new figure (see Analysis

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

y how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems. <span>Besides building programs, Make can be used to manage any project where some files must be updated automatically from others whenever the others change. Contents [hide] 1 Origin 2 Derivatives 3 Behavior 4 Makefile 4.1 Rules 4.2 Macros 4.3 Suffix rules 4.4 Pattern rules 4.5 Other elements 5 Example makefiles 6 See also 7 R

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

urce) and the transformation actions might be to convert the file to some specific format, copy the result into a content management system, and then send e-mail to a predefined set of users indicating that the above actions were performed. <span>Make is invoked with a list of target file names to build as command-line arguments: make [TARGET ...] Without arguments, Make builds the first target that appears in its makefile, which is traditionally a symbolic "phony" target named all. Make d

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

a list of target file names to build as command-line arguments: make [TARGET ...] Without arguments, Make builds the first target that appears in its makefile, which is traditionally a symbolic "phony" target named all. <span>Make decides whether a target needs to be regenerated by comparing file modification times. [31] This solves the problem of avoiding the building of files which are already up to date, but it fails when a file changes but its modification time stays in the past. Such changes

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ut arguments, Make builds the first target that appears in its makefile, which is traditionally a symbolic "phony" target named all. Make decides whether a target needs to be regenerated by comparing file modification times. [31] <span>This solves the problem of avoiding the building of files which are already up to date, but it fails when a file changes but its modification time stays in the past. Such changes could be caused by restoring an older version of a source file, or when a network filesystem is a source of files and its clock or timezone is not synchronized with the mac

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

The user must handle this situation by forcing a complete build. Conversely, if a source file's modification time is in the future, it triggers unnecessary rebuilding, which may inconvenience users. Makefile[edit] Main article: Makefile <span>Make searches the current directory for the makefile to use, e.g. GNU make searches files in order for a file named one of GNUmakefile, makefile, Makefile and then runs the specified (or default) target(s) from (only) that file. The makefile

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

akefile Make searches the current directory for the makefile to use, e.g. GNU make searches files in order for a file named one of GNUmakefile, makefile, Makefile and then runs the specified (or default) target(s) from (only) that file. <span>The makefile language is similar to declarative programming. [32] [33] [34] [35] This class of language, in which necessary end conditions are described but the order in which actions are to be taken is not important, is sometimes confusing to programmers used to imperative programming. One problem in build automation is the tailoring of a build process to a given platform. For instance, the compi

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

Make’s fundamental concepts are common across build tools. GNU Make is a free, fast, well-documented, and very popular Make implementation. From now on, we will focus on it, and when we say Make, we mean GNU Make. Key Points <span>Make allows us to specify what depends on what and how to update things that are out of date. lesson home next episode Copyright © 2016–2