Probability Theory (For Scientists and Engineers)
Michael Betancourt
April 2018
Formal probability theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. In particular, many introductions to probability theory sloppily confound the abstract mathematics with their practical implementations, convoluting what we can calculate in the theory with how we perform those calculations. To make matters even worse, probability theory is used to model a variety of subtlety different systems, which then burdens the already confused mathematics with the distinct and often conflicting philosophical connotations of those applications.
In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory. Our ultimate goal is to demystify what we can calculate in probability theory and how we can perform those calculations in practice. We begin with an introduction to abstract set theory, continue to probability theory, and then move onto practical implementations without any interpretational distraction. We will spend time more thoroughly reviewing sampling-based calculation methods before finally considering the classic applications of probability theory and the interpretations of the theory that then arise.
In a few places I will dip a bit deeper into the mathematics than is strictly necessary. These section are labeled “Extra Credit” and may be skipped without any consequence, but they do provide further unification and motivation of some of the more subtle aspects of probability theory.
Let me open with a warning that the section on abstract probability theory will be devoid of any concrete examples. This is not because of any conspiracy to confuse the reader, but rather is a consequence of the fact that we cannot explicitly construct abstract probability distributions in any meaningful sense. Instead we must utilize problem-specific representations of abstract probability distributions which means that concrete examples will have to wait until we introduce these representations in Section 3.
1 Setting A Foundation
Ultimately probability theory concerns itself with sets in a given space. Before we consider the theory at any detail we will first review some of the important results from set theory that we will need. For a more complete and rigorous treatment of set theory see Folland (1999) and the appendix of Lee (2011) .
A set is a collection of elements with a space the collection of all elements under consideration. For example, { 1 } {1} \{1\} , { 5 , 10 , 12 } {5,10,12} \{5, 10, 12\} , and { 30 , 2 , 7 } {30,2,7} \{30, 2, 7\} , are all sets drawn from the space of natural numbers, ℕ = { 0 , 1 , … } N={0,1,…} \mathbb{N} = \left\{0, 1, \ldots \right\} . A point set or atomic set is a set containing a single element such as { 1 } {1} \{1\} above. The entire space itself is always a valid set, as is the empty set or null set, ∅ ∅ \emptyset , which contains no elements at all.
Sets are often defined implicitly via an inclusion criterion. These sets are defined via the notation
A = { x ∈ X ∣ f ( x ) = 0 } , A={x∈X∣f(x)=0}, A = \left\{ x \in X \mid f(x) = 0 \right\}, which reads “the set of elements x x x in the space X X X such that f ( x ) = 0 f(x)=0 f(x) = 0 .” 1.1 Set Operations
There are three natural operations between sets.
Given a set, A A A , its compliment, A c Ac A^{c} , is defined by all of the elements not in that set,
A c = { x ∈ X ∣ x ∉ A } . Ac={x∈X∣x∉A}. A^{c} = \left\{ x \in X \mid x \notin A \right\}.
We can construct the union of two sets, A 1 ∪ A 2 A1∪A2 A_{1} \cup A_{2} , as the set of elements included in ei
...