Do you want BuboFlash to help you learning these things? Click here to log in or create user.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

for a non-existent Python version. Jupyter is set-up to be able to use a wide range of "kernels", or execution engines for the code. These can be Python 2, Python 3, R, Julia, Ruby... there are dozens of possible kernels to use. <span>But in order for this to happen, Jupyter needs to know where to look for the associated executable: that is, it needs to know which path the python sits in. These paths are specified in jupyter's kernelspec , and it's possible for the user to adjust them to their desires. For example, here's the list of kernels that I have on my system: $ jupyter kernelspec list Available kernels: python2.7 /Users/jakevdp/.ipython/kernels/python2.7 python3.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ity theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. <span>In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications. In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory. Our ultimate goal is to demystify what we can

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

robability theory. Let me open with a warning that the section on abstract probability theory will be devoid of any concrete examples. This is not because of any conspiracy to confuse the reader, but rather is a consequence of the fact that <span>we cannot explicitly construct abstract probability distributions in any meaningful sense. Instead we must utilize problem-specific representations of abstract probability distributions which means that concrete examples will have to wait until we introduce these representations in Section 3. 1 Setting A Foundation Ultimately probability theory concerns itself wi

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

emented with responsibility. In particular, while dynamic implementations of Hamiltonian Monte Carlo, i.e. implementations where the integration time is dynamic, do perform well over a large class of models their success is not guaranteed. <span>When they do fail, however, their failures manifest in diagnostics that are readily checked. By acknowledging and respecting these diagnostics you can ensure that Stan is accurately fitting the Bayesian posterior and hence accurately characterizing your model. And only with

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

chains (at convergence, Rhat=1). We can investigate each more programatically, however, using some of our utility functions. Checking Split R̂ R^ and Effective Sample Sizes¶ As noted in Section 1, <span>the effective sample size quantifies the accuracy of the Markov chain Monte Carlo estimator of a given function, here each parameter mean, provided that geometric ergodicity holds. The potential problem with these effective sample sizes, however, is that we must estimate them from the fit output. When we geneate less than 0.001 effective samples per transition of the Markov chain the estimators that we use are typically biased and can significantly overestimate the true effective sample size. We can check that our effective sample size per iteration is large enough with one of our utility functions, In [14]: stan_utility.check_n_eff(fit)

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

agnostics that can indicate problems with the fit. These diagnostics are extremely sensitive and typically indicate problems long before the arise in the more universal diagnostics considered above. Checking the Tree Depth¶ <span>The dynamic implementation of Hamiltonian Monte Carlo used in Stan has a maximum trajectory length built in to avoid infinite loops that can occur for non-identified models. For sufficiently complex models, however, Stan can saturate this threshold even if the model is identified, which limits the efficacy of the sampler. We can check whether that threshold was hit using one of our utility functions, In [17]: stan_utility.check_treedepth(fit) 0 of 4000 iterat

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your model <span>The stan_utility module uses the threshold of 0.2 to diagnose problems, although this is based on preliminary empirical studies and should be taken only as a very rough recommendation. In particular, this diagnostic comes out of recent theoretical work an

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ow E-BFMI are remedied by tweaking the specification of the model. Unfortunately the exact tweaks required depend on the exact structure of the model and, consequently, there are no generic solutions. Checking Divergences¶ <span>Finally, we can check divergences which indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well. For this fit we have a significant number of divergences In [19]: stan_utility.check_div(fit) 202.0 of 4000 iterations ended with a divergen

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

nce (5.05%) Try running with larger adapt_delta to remove the divergences indicating that the Markov chains did not completely explore the posterior and that our Markov chain Monte Carlo estimators will be biased. <span>Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: fit = model.sampling(data=data, seed=194838, control=dict(adapt_delta=0.9)) Checking again, In [21]: sampler_params = fit.get_sam

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

45.0 of 4000 iterations ended with a divergence (1.125%) Try running with larger adapt_delta to remove the divergences we see that while the divergences were reduced they did not completely vanish. <span>In order to argue that divergences are only false positives, the divergences have to be completely eliminated for some adapt_delta sufficiently close to 1. Here we could continue increasing adapt_delta , where we would see that the divergences do not completely vanish, or we can analyze the existing divergences graphically. If the dive

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

ompletely eliminated for some adapt_delta sufficiently close to 1. Here we could continue increasing adapt_delta , where we would see that the divergences do not completely vanish, or we can analyze the existing divergences graphically. <span>If the divergences are not false positives then they will tend to concentrate in the pathological neighborhoods of the posterior. Falsely positive divergent iterations, however, will follow the same distribution as non-divergent iterations. Here we will use the partition_div function of the stan_utility module to separate divergence and non-divergent iterations, but note that this function works only if your model pa

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

color = green, alpha=0.5) plot.gca().set_xlabel("theta_1") plot.gca().set_ylabel("tau") plot.show() WARNING:root:`dtypes` ignored when `permuted` is False. <span>One of the challenges with a visual analysis of divergences is determining exactly which parameters to examine. Consequently visual analyses are most useful when there are already components of the model about which you are suspicious, as in this case where we know that the correlation between random effects ( theta_1 through theta_8 ) and the hierarchical standard deviation, tau , can be problematic. Indeed we

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

be problematic. Indeed we see the divergences clustering towards small values of tau where the posterior abruptly stops. This abrupt stop is indicative of a transition into a pathological neighborhood that Stan was not able to penetrate. <span>In order to avoid this issue we have to consider a modification to our model, and in this case we can appeal to a non-centered parameterization of the same model that does not suffer these issues. A Successful Fit¶ Multiple diagnostics have indicated that our fit of the centered parameterization of our hierarchical model is not to be trusted, so let's instead c

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

s About Projects Talks Reading Log Quantifying Three Years of Reading Posted on December 29, 2017 <span>Since December 2014, I have tracked the books I read in a Google spreadsheet. It recently occurred to me to use this data to quantify how my reading habits have changed over time. This post will use PyMC3 to model my reading habits. %matplotlib inline from it

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Since December 2014, I have tracked the books I read in a Google spreadsheet.

s About Projects Talks Reading Log Quantifying Three Years of Reading Posted on December 29, 2017 <span>Since December 2014, I have tracked the books I read in a Google spreadsheet. It recently occurred to me to use this data to quantify how my reading habits have changed over time. This post will use PyMC3 to model my reading habits. %matplotlib inline from it

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

When you type any command at the prompt (say, python ), the system has a well-defined sequence of places that it looks for the executable. This sequence is defined in a system variable called PATH , which the user can specify. To see your PATH , you can type echo $PATH . </spa

thon installs and finds packages How Jupyter knows what Python to use For the sake of completeness, I'll try to do a quick ELI5 on each of these, so you'll know how to solve this issue in the best way for you. 1. Unix/Linux/OSX $PATH <span>When you type any command at the prompt (say, python ), the system has a well-defined sequence of places that it looks for the executable. This sequence is defined in a system variable called PATH , which the user can specify. To see your PATH , you can type echo $PATH . The result is a list of directories on your computer, which will be searched in order for the desired executable. From your output above, I assume that it contains this: $ echo

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

mmand at the prompt (say, python ), the system has a well-defined sequence of places that it looks for the executable. This sequence is defined in a system variable called PATH , which the user can specify. To see your PATH , you can type <span>echo $PATH . <span><body><html>

thon installs and finds packages How Jupyter knows what Python to use For the sake of completeness, I'll try to do a quick ELI5 on each of these, so you'll know how to solve this issue in the best way for you. 1. Unix/Linux/OSX $PATH <span>When you type any command at the prompt (say, python ), the system has a well-defined sequence of places that it looks for the executable. This sequence is defined in a system variable called PATH , which the user can specify. To see your PATH , you can type echo $PATH . The result is a list of directories on your computer, which will be searched in order for the desired executable. From your output above, I assume that it contains this: $ echo

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The result is a list of directories on your computer, which will be searched in order for the desired executable.

mpt (say, python ), the system has a well-defined sequence of places that it looks for the executable. This sequence is defined in a system variable called PATH , which the user can specify. To see your PATH , you can type echo $PATH . <span>The result is a list of directories on your computer, which will be searched in order for the desired executable. From your output above, I assume that it contains this: $ echo $PATH /usr/bin/:/Library/Frameworks/Python.framework/Versions/3.5/bin/:/usr/local/bin/ In windows echo %path% P

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

When you run python and do something like import matplotlib , Python has to play a similar game to find the package you have in mind. Similar to $PATH in unix, Python has sys.path that specifies these

many installations of the same program on your system. Changing the path is not too complicated; see e.g. How to permanently set $PATH on Linux?. Windows - How to set environment variables in Windows 10 2. How Python finds packages <span>When you run python and do something like import matplotlib , Python has to play a similar game to find the package you have in mind. Similar to $PATH in unix, Python has sys.path that specifies these: $ python >>> import sys >>> sys.path ['', '/Users/jakevdp/anaconda/lib/python3.5', '/Users/jakevdp/anaconda/lib/python3.5/site-packages', ...] Some importan

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

When you run python and do something like import matplotlib , Python has to play a similar game to find the package you have in mind. Similar to $PATH in unix, Python has sys.path that specifies these

many installations of the same program on your system. Changing the path is not too complicated; see e.g. How to permanently set $PATH on Linux?. Windows - How to set environment variables in Windows 10 2. How Python finds packages <span>When you run python and do something like import matplotlib , Python has to play a similar game to find the package you have in mind. Similar to $PATH in unix, Python has sys.path that specifies these: $ python >>> import sys >>> sys.path ['', '/Users/jakevdp/anaconda/lib/python3.5', '/Users/jakevdp/anaconda/lib/python3.5/site-packages', ...] Some importan

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

by default, the first entry in sys.path is the current directory.

nix, Python has sys.path that specifies these: $ python >>> import sys >>> sys.path ['', '/Users/jakevdp/anaconda/lib/python3.5', '/Users/jakevdp/anaconda/lib/python3.5/site-packages', ...] Some important things: <span>by default, the first entry in sys.path is the current directory. Also, unless you modify this (which you shouldn't do unless you know exactly what you're doing) you'll usually find something called site-packages in the path: this is the default pla

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

is the current directory. Also, unless you modify this (which you shouldn't do unless you know exactly what you're doing) you'll usually find something called site-packages in the path: this is the default place that Python puts packages when you install them using python setup.py install , or pip , or conda , or a similar means.

these: $ python >>> import sys >>> sys.path ['', '/Users/jakevdp/anaconda/lib/python3.5', '/Users/jakevdp/anaconda/lib/python3.5/site-packages', ...] Some important things: by default, the first entry in sys.path <span>is the current directory. Also, unless you modify this (which you shouldn't do unless you know exactly what you're doing) you'll usually find something called site-packages in the path: this is the default place that Python puts packages when you install them using python setup.py install , or pip , or conda , or a similar means. The important thing to note is that each python installation has its own site-packages , where packages are installed for that specific Python version . In other words, if you inst

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to w

cause it was installed on a different Python! This is why in our twitter exchange I recommended you focus on one Python installation, and fix your $PATH so that you're only using the one you want to use. There's another component to this: <span>some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to work only with that Python installation . That means that, as your system is set-up, when you run python , you get /usr/bin/python , but when you run ipython , you get /Library/Frameworks/Python.framework/Versions/3.5/bi

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to w

cause it was installed on a different Python! This is why in our twitter exchange I recommended you focus on one Python installation, and fix your $PATH so that you're only using the one you want to use. There's another component to this: <span>some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to work only with that Python installation . That means that, as your system is set-up, when you run python , you get /usr/bin/python , but when you run ipython , you get /Library/Frameworks/Python.framework/Versions/3.5/bi

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

ead> some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to work only with that Python installation . <html>

cause it was installed on a different Python! This is why in our twitter exchange I recommended you focus on one Python installation, and fix your $PATH so that you're only using the one you want to use. There's another component to this: <span>some Python packages come bundled with stand-alone scripts that you can run from the command line (examples are pip , ipython , jupyter , pep8 , etc.) By default, these executables will be put in the same directory path as the Python used to install them, and are designed to work only with that Python installation . That means that, as your system is set-up, when you run python , you get /usr/bin/python , but when you run ipython , you get /Library/Frameworks/Python.framework/Versions/3.5/bi

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Well, first make sure your $PATH variable is doing what you want it to. You likely have a startup script called something like ~/.bash_profile or ~/.bashrc that sets this $PATH variable.

s that the packages you can import when running python are entirely separate from the packages you can import when running ipython or a Jupyter notebook: you're using two completely independent Python installations. So how to fix this? <span>Well, first make sure your $PATH variable is doing what you want it to. You likely have a startup script called something like ~/.bash_profile or ~/.bashrc that sets this $PATH variable. On Windows, you can modify the user specific environment variables. You can manually modify that if you want your system to search things in a different order. When you first install an

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

ad> But in order for this to happen, Jupyter needs to know where to look for the associated executable: that is, it needs to know which path the python sits in. These paths are specified in jupyter's kernelspec , and it's possible for the user to adjust them to their desires. <html>

for a non-existent Python version. Jupyter is set-up to be able to use a wide range of "kernels", or execution engines for the code. These can be Python 2, Python 3, R, Julia, Ruby... there are dozens of possible kernels to use. <span>But in order for this to happen, Jupyter needs to know where to look for the associated executable: that is, it needs to know which path the python sits in. These paths are specified in jupyter's kernelspec , and it's possible for the user to adjust them to their desires. For example, here's the list of kernels that I have on my system: $ jupyter kernelspec list Available kernels: python2.7 /Users/jakevdp/.ipython/kernels/python2.7 python3.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

IPython relies on the ipykernel package which contains a command to install a python kernel: for example $ python - m ipykernel install It will create a kernelspec associated with the Python executable you use to run th

specifies the kernel name, the path to the executable, and other relevant info. You can adjust kernels manually, editing the metadata inside the directories listed above. The command to install a kernel can change depending on the kernel. <span>IPython relies on the ipykernel package which contains a command to install a python kernel: for example $ python -m ipykernel install It will create a kernelspec associated with the Python executable you use to run this command. You can then choose this kernel in the Jupyter notebook to run your code with that Python. You can see other options that ipykernel provides using the help command: $ python -m ipykernel install --help usage: ipython-kernel-install [-h] [--user] [--name NAME]

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

IPython relies on the ipykernel package which contains a command to install a python kernel: for example $ python - m ipykernel install It will create a kernelspec associated with the Python executable you use to run this command. You can then choose this kernel in the Jupyter notebook to run your code with that Python. </h

specifies the kernel name, the path to the executable, and other relevant info. You can adjust kernels manually, editing the metadata inside the directories listed above. The command to install a kernel can change depending on the kernel. <span>IPython relies on the ipykernel package which contains a command to install a python kernel: for example $ python -m ipykernel install It will create a kernelspec associated with the Python executable you use to run this command. You can then choose this kernel in the Jupyter notebook to run your code with that Python. You can see other options that ipykernel provides using the help command: $ python -m ipykernel install --help usage: ipython-kernel-install [-h] [--user] [--name NAME]

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

divergences indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well.

ow E-BFMI are remedied by tweaking the specification of the model. Unfortunately the exact tweaks required depend on the exact structure of the model and, consequently, there are no generic solutions. Checking Divergences¶ <span>Finally, we can check divergences which indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well. For this fit we have a significant number of divergences In [19]: stan_utility.check_div(fit) 202.0 of 4000 iterations ended with a divergen

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

divergences indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well.

ow E-BFMI are remedied by tweaking the specification of the model. Unfortunately the exact tweaks required depend on the exact structure of the model and, consequently, there are no generic solutions. Checking Divergences¶ <span>Finally, we can check divergences which indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well. For this fit we have a significant number of divergences In [19]: stan_utility.check_div(fit) 202.0 of 4000 iterations ended with a divergen

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

divergences indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well.

ow E-BFMI are remedied by tweaking the specification of the model. Unfortunately the exact tweaks required depend on the exact structure of the model and, consequently, there are no generic solutions. Checking Divergences¶ <span>Finally, we can check divergences which indicate pathological neighborhoods of the posterior that the simulated Hamiltonian trajectories are not able to explore sufficiently well. For this fit we have a significant number of divergences In [19]: stan_utility.check_div(fit) 202.0 of 4000 iterations ended with a divergen

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectori

nce (5.05%) Try running with larger adapt_delta to remove the divergences indicating that the Markov chains did not completely explore the posterior and that our Markov chain Monte Carlo estimators will be biased. <span>Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: fit = model.sampling(data=data, seed=194838, control=dict(adapt_delta=0.9)) Checking again, In [21]: sampler_params = fit.get_sam

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]:

nce (5.05%) Try running with larger adapt_delta to remove the divergences indicating that the Markov chains did not completely explore the posterior and that our Markov chain Monte Carlo estimators will be biased. <span>Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: fit = model.sampling(data=data, seed=194838, control=dict(adapt_delta=0.9)) Checking again, In [21]: sampler_params = fit.get_sam

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

l> Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: <html>

nce (5.05%) Try running with larger adapt_delta to remove the divergences indicating that the Markov chains did not completely explore the posterior and that our Markov chain Monte Carlo estimators will be biased. <span>Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: fit = model.sampling(data=data, seed=194838, control=dict(adapt_delta=0.9)) Checking again, In [21]: sampler_params = fit.get_sam

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In order to argue that divergences are only false positives, the divergences have to be completely eliminated for some adapt_delta sufficiently close to 1.

45.0 of 4000 iterations ended with a divergence (1.125%) Try running with larger adapt_delta to remove the divergences we see that while the divergences were reduced they did not completely vanish. <span>In order to argue that divergences are only false positives, the divergences have to be completely eliminated for some adapt_delta sufficiently close to 1. Here we could continue increasing adapt_delta , where we would see that the divergences do not completely vanish, or we can analyze the existing divergences graphically. If the dive

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

If the divergences are not false positives then they will tend to concentrate in the pathological neighborhoods of the posterior. Falsely positive divergent iterations, however, will follow the same distribution as non-divergent iterations.

ompletely eliminated for some adapt_delta sufficiently close to 1. Here we could continue increasing adapt_delta , where we would see that the divergences do not completely vanish, or we can analyze the existing divergences graphically. <span>If the divergences are not false positives then they will tend to concentrate in the pathological neighborhoods of the posterior. Falsely positive divergent iterations, however, will follow the same distribution as non-divergent iterations. Here we will use the partition_div function of the stan_utility module to separate divergence and non-divergent iterations, but note that this function works only if your model pa

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

If the divergences are not false positives then they will tend to concentrate in the pathological neighborhoods of the posterior. Falsely positive divergent iterations, however, will follow the same distribution as non-divergent iterations.

ompletely eliminated for some adapt_delta sufficiently close to 1. Here we could continue increasing adapt_delta , where we would see that the divergences do not completely vanish, or we can analyze the existing divergences graphically. <span>If the divergences are not false positives then they will tend to concentrate in the pathological neighborhoods of the posterior. Falsely positive divergent iterations, however, will follow the same distribution as non-divergent iterations. Here we will use the partition_div function of the stan_utility module to separate divergence and non-divergent iterations, but note that this function works only if your model pa

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

One of the challenges with a visual analysis of divergences is determining exactly which parameters to examine. Consequently visual analyses are most useful when there are already components of the model about which you are suspicious

color = green, alpha=0.5) plot.gca().set_xlabel("theta_1") plot.gca().set_ylabel("tau") plot.show() WARNING:root:`dtypes` ignored when `permuted` is False. <span>One of the challenges with a visual analysis of divergences is determining exactly which parameters to examine. Consequently visual analyses are most useful when there are already components of the model about which you are suspicious, as in this case where we know that the correlation between random effects ( theta_1 through theta_8 ) and the hierarchical standard deviation, tau , can be problematic. Indeed we

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The stan_utility module uses the threshold of 0.2 to diagnose problems

ction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your model <span>The stan_utility module uses the threshold of 0.2 to diagnose problems, although this is based on preliminary empirical studies and should be taken only as a very rough recommendation. In particular, this diagnostic comes out of recent theoretical work an

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

ory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the <span>energy Bayesian Fraction of Missing Information, <span><body><html>

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next tr

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

rst simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the <span>jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, <span><body><html>

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The dynamic implementation of Hamiltonian Monte Carlo used in Stan has a maximum trajectory length built in to avoid infinite loops that can occur for non-identified models. For sufficiently complex models, however, Stan can saturate this threshold even if the model is identified, wh

agnostics that can indicate problems with the fit. These diagnostics are extremely sensitive and typically indicate problems long before the arise in the more universal diagnostics considered above. Checking the Tree Depth¶ <span>The dynamic implementation of Hamiltonian Monte Carlo used in Stan has a maximum trajectory length built in to avoid infinite loops that can occur for non-identified models. For sufficiently complex models, however, Stan can saturate this threshold even if the model is identified, which limits the efficacy of the sampler. We can check whether that threshold was hit using one of our utility functions, In [17]: stan_utility.check_treedepth(fit) 0 of 4000 iterat

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

The dynamic implementation of Hamiltonian Monte Carlo used in Stan has a maximum trajectory length built in to avoid infinite loops that can occur for non-identified models. For sufficiently complex models, however, Stan can saturate this threshold even if the model is identified, wh

agnostics that can indicate problems with the fit. These diagnostics are extremely sensitive and typically indicate problems long before the arise in the more universal diagnostics considered above. Checking the Tree Depth¶ <span>The dynamic implementation of Hamiltonian Monte Carlo used in Stan has a maximum trajectory length built in to avoid infinite loops that can occur for non-identified models. For sufficiently complex models, however, Stan can saturate this threshold even if the model is identified, which limits the efficacy of the sampler. We can check whether that threshold was hit using one of our utility functions, In [17]: stan_utility.check_treedepth(fit) 0 of 4000 iterat

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Split R̂ quantifies an important necessary condition for geometric ergodicity

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

> Both large split R ̂ R^ \hat{R} and low effective sample size per iteration are consequences of poorly mixing Markov chains. Improving the mixing of the Markov chains almost always requires tweaking the model specification, for example with a reparameterization or stronger priors. <html>

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

the effective sample size quantifies the accuracy of the Markov chain Monte Carlo estimator of a given function, here each parameter mean, provided that geometric ergodicity holds. The potential problem with the

chains (at convergence, Rhat=1). We can investigate each more programatically, however, using some of our utility functions. Checking Split R̂ R^ and Effective Sample Sizes¶ As noted in Section 1, <span>the effective sample size quantifies the accuracy of the Markov chain Monte Carlo estimator of a given function, here each parameter mean, provided that geometric ergodicity holds. The potential problem with these effective sample sizes, however, is that we must estimate them from the fit output. When we geneate less than 0.001 effective samples per transition of the Markov chain the estimators that we use are typically biased and can significantly overestimate the true effective sample size. We can check that our effective sample size per iteration is large enough with one of our utility functions, In [14]: stan_utility.check_n_eff(fit)

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

we cannot explicitly construct abstract probability distributions in any meaningful sense. Instead we must utilize problem-specific representations of abstract probability distributions

robability theory. Let me open with a warning that the section on abstract probability theory will be devoid of any concrete examples. This is not because of any conspiracy to confuse the reader, but rather is a consequence of the fact that <span>we cannot explicitly construct abstract probability distributions in any meaningful sense. Instead we must utilize problem-specific representations of abstract probability distributions which means that concrete examples will have to wait until we introduce these representations in Section 3. 1 Setting A Foundation Ultimately probability theory concerns itself wi

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different

ity theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. <span>In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications. In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory. Our ultimate goal is to demystify what we can

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often

ity theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. <span>In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications. In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory. Our ultimate goal is to demystify what we can

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

ppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of <span>subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications. <span><body><html>

ity theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. <span>In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications. In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory. Our ultimate goal is to demystify what we can

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which

.sampling(data=data, seed=194838, control=dict(max_treedepth=15)) and then check if still saturated this larger threshold with stan_utility.check_treedepth(fit, 15) Checking the E-BFMI¶ <span>Hamiltonian Monte Carlo proceeds in two phases -- the algorithm first simulates a Hamiltonian trajectory that rapidly explores a slice of the target parameter space before resampling the auxiliary momenta to allow the next trajectory to explore another slice of the target parameter space. Unfortunately, the jumps between these slices induced by the momenta resamplings can be short, which often leads to slow exploration. We can identify this problem by consulting the energy Bayesian Fraction of Missing Information, In [18]: stan_utility.check_energy(fit) Chain 2: E-BFMI = 0.177681346951 E-BFMI below 0.2 indicates you may need to reparameterize your mod

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]:

nce (5.05%) Try running with larger adapt_delta to remove the divergences indicating that the Markov chains did not completely explore the posterior and that our Markov chain Monte Carlo estimators will be biased. <span>Divergences, however, can sometimes be false positives. To verify that we have real fitting issues we can rerun with a larger target acceptance probability, adapt_delta , which will force more accurate simulations of Hamiltonian trajectories and reduce the false positives. In [20]: fit = model.sampling(data=data, seed=194838, control=dict(adapt_delta=0.9)) Checking again, In [21]: sampler_params = fit.get_sam