# on 01-Mar-2017 (Wed)

#### Flashcard 1425554148620

Tags
Question

Reading 13 explains [...]—the study of how buyers and sellers interact to determine transaction prices and quantities.
the concepts and tools of demand and supply analysis

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Reading 13 explains the concepts and tools of demand and supply analysis—the study of how buyers and sellers interact to determine transaction prices and quantities.

#### Original toplevel document

Study Session 4
This study session focuses on the microeconomic principles used to describe the marketplace behavior of consumers and firms. Reading 13 explains the concepts and tools of demand and supply analysis—the study of how buyers and sellers interact to determine transaction prices and quantities. Reading 14 covers the theory of the consumer, which addresses the demand for goods and services by individuals who make decisions to maximize the satisfaction they receive fr

#### Flashcard 1480486423820

Tags
#deeplearning #neuralnetworks
Question
[...] are trained to preserv e as muc h information as p ossible when an input is run through the enco der and then the deco der, but are also trained to make the new represen tation hav e v arious nice prop erties.
Auto enco ders

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Auto enco ders are trained to preserv e as muc h information as p ossible when an input is run through the enco der and then the deco der, but are also trained to make the new represen tation hav e v

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 1481711422732

 #bayes #programming #r #statistics The likelihood function, although it specifies a probability at each value of θ,isnot a probability distribution. In particular, it does not integrate to 1

#### pdf

cannot see any pdfs

#### Annotation 1481712995596

 #bayes #programming #r #statistics Bernoulli likelihood function really refers to a single flip

#### pdf

cannot see any pdfs

#### Annotation 1481714568460

 #bayes #programming #r #statistics there are two desiderata for mathematical tractability. First, it would be convenient if the product of p(y|θ) and p(θ), which is in the numerator of Bayes’ rule, results in a function of the same form as p(θ ). When this is the case, the prior and posterior beliefs are described using the same form of function. This quality allows us to include subsequent additional data and derive another posterior distribution, again of the same form as the prior.

#### pdf

cannot see any pdfs

#### Annotation 1481716141324

 #bayes #programming #r #statistics Second, we desire the denominator of Bayes’ rule (Equation 5.9, p. 107), namely dθ p(y|θ)p(θ), to be solvable analytically. This quality also depends on how the form of the function p(θ ) relates to the form of the function p(y|θ). When the forms of p(y|θ) and p(θ) combine so that the posterior distribution has the same form as the prior distribution, then p(θ) is called a conjugate prior for p(y|θ)

#### pdf

cannot see any pdfs

#### Annotation 1481717714188

 #bayes #programming #r #statistics A probability density of that form is called a beta distribution.Formally,abeta distribution has two parameters, called a and b, and the density itself is defined as p(θ|a, b) = beta(θ|a, b) = θ (a−1) (1 − θ) (b−1) /B(a, b)

#### pdf

cannot see any pdfs

#### Annotation 1481719287052

 #bayes #programming #r #statistics Inferring a Binomial Probability via Exact Mathematical Analysis 127 A probability density of that form is called a beta distribution.Formally,abeta distribution has two parameters, called a and b, and the density itself is defined as p(θ|a, b) = beta(θ|a, b) = θ (a−1) (1 − θ) (b−1) /B(a, b) (6.3) where B (a, b) is simply a normalizing constant that ensures that the area under the beta density integrates to 1.0, as all probability density functions must

#### pdf

cannot see any pdfs

#### Annotation 1481720859916

 #bayes #programming #r #statistics In other words, the normalizer for the beta distribution is the beta function $$B(a,b) = \int d\theta \space \theta^{a-1}(1-\theta)^{b-1}$$

#### pdf

cannot see any pdfs

#### Annotation 1481722432780

 #bayes #programming #r #statistics In R, beta(θ|a, b) is dbeta(θ,a,b),and B(a, b) is beta(a,b)

#### pdf

cannot see any pdfs

#### Annotation 1481724005644

 #bayes #programming #r #statistics Notice that as a gets bigger (left to right across columns of Figure 6.1), the bulk of the distribution moves rightward over higher values of θ,butasb gets bigger (top to bottomacrossrowsofFigure 6.1), the bulk of the distribution moves leftward over lower values of θ.

#### pdf

cannot see any pdfs

#### Annotation 1481725578508

 #bayes #programming #r #statistics Notice that as a and b get bigger together, the beta distribution gets narrower.

#### pdf

cannot see any pdfs

#### Annotation 1481727151372

 #bayes #programming #r #statistics The variables a and b are called the shape parameters of the beta distribution because they determine its shape

#### pdf

cannot see any pdfs

#### Annotation 1481728724236

 #bayes #programming #r #statistics Often we think of our prior beliefs in terms of a central tendency and certainty about that central tendency. For example, in thinking about the probability of left handedness in the general population of people, we might think from everyday experience that it’s around 10%. But if we are not very certain about that value, we might consider the equivalent previous sample size to be small, say, n = 10

#### pdf

cannot see any pdfs

#### Annotation 1481730297100

 #bayes #programming #r #statistics Our goal is to convert a prior belief expressed in terms of central tendency and sample size into equivalent values of a and b in the beta distribution

#### pdf

cannot see any pdfs

#### Annotation 1481731869964

 #bayes #programming #r #statistics It turns out that the mean of the beta(θ|a, b) distribution is μ = a/(a + b) and the mode is ω = (a − 1)/(a + b − 2) for a > 1and b > 1(μ is Greek letter mu and ω is Greek letter omega)

#### pdf

cannot see any pdfs

#### Annotation 1481733442828

 #bayes #programming #r #statistics The spread of the beta distribution is related to the “concentration” κ = a +b (κ is Greek letter kappa)

#### pdf

cannot see any pdfs

#### Annotation 1481735015692

 #bayes #programming #r #statistics Solving those equations for a and b yields the following formulas for a and b in terms of the mean μ,themodeω, and the concentration κ: a = μκ and b = (1 − μ)κ (6.5) a = ω(κ − 2) + 1andb = (1 − ω)(κ − 2) + 1forκ>2

#### pdf

cannot see any pdfs

#### Annotation 1481736588556

 #bayes #programming #r #statistics The value we choose for the prior κ can be thought of this way: It is the number of new flips of the coin that we would need to make us teeter between the new data and the prior belief about μ. If we would only need a few new flips to sway our beliefs, then our prior beliefs should be represented by a small κ. If we would need a large number of new flips to sway us away from our prior beliefs about μ, then our prior beliefs are worth a very large κ.

#### pdf

cannot see any pdfs

#### Annotation 1481738161420

 #bayes #programming #r #statistics Because the beta distribution is usually skewed, it can be more intuitive to think in terms of its mode instead of its mean. When κ is smaller, as in the left column, the beta distribution is wider than when κ is larger, as in the right column

#### pdf

cannot see any pdfs

#### Annotation 1481739734284

 #bayes #programming #r #statistics For a beta density with mean μ and standard deviation σ , the shape parameters are a = μ μ(1 − μ) σ 2 − 1 and b = (1 − μ) μ(1 − μ) σ 2 − 1

#### pdf

cannot see any pdfs

#### Annotation 1481741307148

 #bayes #programming #r #statistics the standard deviation must make sense in the context of a beta density. In particular, the standard deviation should typically be less than 0.28867

#### pdf

cannot see any pdfs

#### Annotation 1481742880012

 #bayes #programming #r #statistics a beta(θ|12, 12) distribution has a standard deviation of 0.1

#### pdf

cannot see any pdfs

#### Annotation 1481744452876

 #bayes #programming #r #statistics In most applications, we will deal with beta distributions for which a ≥ 1and b ≥ 1, that is, κ>2. This reflects prior knowledge that the coin has a head side and a

#### pdf

cannot see any pdfs

#### Annotation 1481746025740

 #bayes #programming #r #statistics The standard deviation of the beta distribution is $$\sqrt{μ(1 − μ)/(a + b +1)}$$. Notice that the standard deviation gets smaller when the concentration κ = a + b gets larger.

#### pdf

cannot see any pdfs

#### Annotation 1481747598604

 #bayes #programming #r #statistics There are some situations, however, in which it may be convenient to use beta distributions in which a < 1and/orb < 1, or for which we cannot be confident that κ>2. For example, we might believe that the coin is a trick coin that nearly always comes up heads or nearly always comes up tails, but we don’t know which. In these situations, we cannot use the parameterization in terms of the mode, which requires κ>2, and instead we can use the parameterization of the beta distribution in terms of the mean

#### pdf

cannot see any pdfs

#### Annotation 1481749433612

 #bayes #programming #r #statistics If the prior distribution is beta(θ|a, b), and the data have z heads in N flips, then the posterior distribution is beta(θ|z + a, N − z + b)

#### pdf

cannot see any pdfs

#### Annotation 1481751006476

 #bayes #programming #r #statistics If the initial prior is a beta distribution, then the posterior distribution is always a beta distribution

#### pdf

cannot see any pdfs

#### Annotation 1481752579340

 #bayes #programming #r #statistics It turns out that the posterior mean can be algebraically re-arranged into a weighted average of the prior mean, a/(a + b), and the data proportion, z/N ,as follows: z + a N + a + b posterior = z N data N N + a + b weight + a a + b prior a + b N + a + b

#### pdf

cannot see any pdfs

#### Annotation 1481754152204

 #biochem #biology #cell Any region of a protein’s surface that can interact with another molecule through sets of noncovalent bonds is called a binding site

#### pdf

cannot see any pdfs

#### Annotation 1481755725068

 #biochem #biology #cell If a bind- ing site recognizes the surface of a second protein, the tight binding of two folded polypeptide chains at this site creates a larger protein molecule with a precisely defined geometry. Each polypeptide chain in such a protein is called a protein subunit

#### pdf

cannot see any pdfs

#### Annotation 1481757297932

 #biochem #biology #cell In the simplest case, two identical folded polypeptide chains bind to each other in a “head-to-head” arrangement, forming a symmetric complex of two protein subunits (a dimer) held together by interactions between two identical binding sites.

#### pdf

cannot see any pdfs

#### Annotation 1481758870796

 #biochem #biology #cell Why is a helix such a common structure in biology? As we have seen, biological structures are often formed by linking similar subunits into long, repetitive chains. If all the subunits are identical, the neighboring subunits in the chain can often fit together in only one way, adjusting their relative positions to minimize the free energy of the con- tact between them. As a result, each subunit is positioned in exactly the same way in relation to the next, so that subunit 3 fits onto subunit 2 in the same way that subunit 2 fits onto subunit 1, and so on. Because it is very rare for subunits to join up in a straight line, this arrangement generally results in a helix

#### pdf

cannot see any pdfs

#### Annotation 1481760443660

 #biochem #biology #cell Handedness is not affected by turning the helix upside down, but it is reversed if the helix is reflected in the mirror

#### pdf

cannot see any pdfs

#### Annotation 1481762016524

 #biochem #biology #cell there are also functions that require each individual protein molecule to span a large distance. These proteins generally have a relatively simple, elongated three-di- mensional structure and are commonly referred to as fibrous proteins

#### pdf

cannot see any pdfs

#### Annotation 1481763589388

 #biochem #biology #cell The coiled-coil regions are capped at each end by globular domains containing bind- ing sites. This enables this class of protein to assemble into ropelike intermediate filaments—an important component of the cytoskeleton that creates the cell’s internal structural framework

#### pdf

cannot see any pdfs

#### Annotation 1481765162252

 #biochem #biology #cell Fibrous proteins are especially abundant outside the cell, where they are a main component of the gel-like extracellular matrix that helps to bind collections of cells together to form tissues

#### pdf

cannot see any pdfs

#### Annotation 1481766735116

 #biochem #biology #cell Cells secrete extracellular matrix proteins into their surroundings, where they often assemble into sheets or long fibrils. Colla- gen is the most abundant of these proteins in animal tissues

#### pdf

cannot see any pdfs

#### Annotation 1481768307980

 #biochem #biology #cell Collagen is a triple helix formed by three extended protein chains that wrap around one another (bottom). Many rodlike collagen molecules are cross-linked together in the extracellular space to form unextendable collagen fibrils (top) that have the tensile strength of steel. The striping on the collagen fibril is caused by the regular repeating arrangement of the collagen molecules within the fibril

#### pdf

cannot see any pdfs

#### Annotation 1481769880844

 #biochem #biology #cell Elastin polypeptide chains are cross-linked together in the extracellular space to form rubberlike, elastic fibers. Each elastin molecule uncoils into a more extended conformation when the fiber is stretched and recoils spontaneously as soon as the stretching force is relaxed. The cross-linking in the extracellular space mentioned creates covalent linkages between lysine side chains, but the chemistry is different for collagen and elastin.

#### pdf

cannot see any pdfs

#### Annotation 1481771453708

 #biochem #biology #cell As a reference, it is useful to remember that standard metal screws, which insert when turned clockwise, are right-handed.

#### pdf

cannot see any pdfs

#### Annotation 1481773026572

 #biochem #biology #cell Many proteins were also known to have intrinsically disordered tails at one or the other end of a structured domain (see, for example, the histones in Figure 4–24). But the extent of such disordered structure only became clear when genomes were sequenced. This allowed bio- informatic methods to be used to analyze the amino acid sequences that genes encode, searching for disordered regions based on their unusually low hydropho- bicity and relatively high net charge.

#### pdf

cannot see any pdfs

#### Annotation 1481774599436

 #biochem #biology #cell it is now thought that perhaps a quarter of all eukaryotic proteins can adopt structures that are mostly disordered, fluctuating rapidly between many different conforma- tions.

#### pdf

cannot see any pdfs

#### Annotation 1481776172300

 #biochem #biology #cell What do these disordered regions do? Some known functions are illustrated in Figure 3–24. One predominant func- tion is to form specific binding sites for other protein molecules that are of high specificity, but readily altered by protein phosphorylation, protein dephosphor- ylation, or any of the other covalent modifications that are triggered by cell sig- naling events

#### pdf

cannot see any pdfs

#### Annotation 1481777745164

 #biochem #biology #cell an unstructured region can also serve as a “tether” to hold two protein domains in close proximity to facilitate their inter- action.

#### pdf

cannot see any pdfs

#### Annotation 1481779318028

 #biochem #biology #cell this tethering function that allows substrates to move between active sites in large multienzyme complexes

#### pdf

cannot see any pdfs

#### Annotation 1481780890892

 #biochem #biology #cell A simi- lar tethering function allows large scaffold proteins with multiple protein-binding sites to concentrate sets of interacting proteins, both increasing reaction rates and confining their reaction to a particular site in a cell

#### pdf

cannot see any pdfs

#### Annotation 1481782463756

 #biochem #biology #cell large numbers of disordered protein chains in close proximity can create micro-regions of gel-like consistency inside the cell that restrict diffusion

#### pdf

cannot see any pdfs

#### Annotation 1481784036620

 #biochem #biology #cell the abundant nucleoporins that coat the inner surface of the nuclear pore complex form a random coil meshwork (Figure 3–24) that is critical for selective nuclear transport

#### pdf

cannot see any pdfs

#### Annotation 1481785609484

 #biochem #biology #cell lysozyme—an enzyme in tears that dissolves bacterial cell walls—retains its antibacterial activity for a long time because it is stabilized by such cross-linkages.

#### pdf

cannot see any pdfs

#### Annotation 1481787182348

 #biochem #biology #cell Disulfide bonds generally fail to form in the cytosol, where a high concentra- tion of reducing agents converts S–S bonds back to cysteine –SH groups. Appar- ently, proteins do not require this type of reinforcement in the relatively mild envi- ronment inside the cell

#### pdf

cannot see any pdfs

#### Annotation 1481788755212

 #biochem #biology #cell The use of smaller subunits to build larger structures has several advantages: 1. A large structure built from one or a few repeating smaller subunits requires only a small amount of genetic information. 2. Both assembly and disassembly can be readily controlled reversible pro- cesses, because the subunits associate through multiple bonds of relatively low energy. 3. Errors in the synthesis of the structure can be more easily avoided, since correction mechanisms can operate during the course of assembly to exclude malformed subunits.

#### pdf

cannot see any pdfs

#### Annotation 1481790328076

 #biochem #biology #cell These principles are dramatically illustrated in the protein coat or capsid of many simple viruses, which takes the form of a hollow sphere based on an icosahedron

#### pdf

cannot see any pdfs

#### Annotation 1481791900940

 #biochem #biology #cell The first large macromolecular aggregate shown to be capable of self-as- sembly from its component parts was tobacco mosaic virus (TMV ).

#### pdf

cannot see any pdfs

#### Annotation 1481793473804

 #biochem #biology #cell the simplest case, a long core protein or other macromolecule provides a scaffold that determines the extent of the final assembly. This is the mechanism that deter- mines the length of the TMV particle, where the RNA chain provides the core. Similarly, a core protein interacting with actin is thought to determine the length of the thin filaments in muscle.

#### pdf

cannot see any pdfs

#### Annotation 1481795046668

 #biochem #biology #cell In these cases, part of the assembly information is provided by special enzymes and other proteins that perform the function of templates, serving as assembly factors that guide construction but take no part in the final assembled structure.

#### pdf

cannot see any pdfs

#### Annotation 1481796619532

 #biochem #biology #cell These are self-propagat- ing, stable β-sheet aggregates called amyloid fibrils. These fibrils are built from a series of identical polypeptide chains that become layered one over the other to create a continuous stack of β sheets, with the β strands oriented perpendicular to the fibril axis to form a cross-beta filament

#### pdf

cannot see any pdfs

#### Annotation 1481798192396

 #biochem #biology #cell Typically, hundreds of monomers will aggregate to form an unbranched fibrous structure that is several micrometers long and 5 to 15 nm in width

#### pdf

cannot see any pdfs

#### Annotation 1481799765260

 #biochem #biology #cell A surprisingly large fraction of pro- teins have the potential to form such structures, because the short segment of the polypeptide chain that forms the spine of the fibril can have a variety of different sequences and follow one of several different paths (Figure 3–32). However, very few proteins will actually form this structure inside cells

#### pdf

cannot see any pdfs

#### Annotation 1481801338124

 #biochem #biology #cell In normal humans, the quality control mechanisms governing proteins grad- ually decline with age, occasionally permitting normal proteins to form patho- logical aggregates. The protein aggregates may be released from dead cells and accumulate as amyloid in the extracellular matrix.

#### pdf

cannot see any pdfs

#### Annotation 1481802910988

 #biochem #biology #cell the abnormal forma- tion of highly stable amyloid fibrils is thought to play a central causative role in both Alzheimer’s and Parkinson’s diseases

#### pdf

cannot see any pdfs

#### Annotation 1481804483852

 #biochem #biology #cell A set of closely related diseases—scra- pie in sheep, Creutzfeldt–Jakob disease (CJD) in humans, Kuru in humans, and bovine spongiform encephalopathy (BSE) in cattle—are caused by a misfolded, aggregated form of a particular protein called PrP

#### pdf

cannot see any pdfs

#### Annotation 1481806056716

 #biochem #biology #cell PrP is nor- mally located on the outer surface of the plasma membrane, most prominently in neurons, and it has the unfortunate property of forming amyloid fibrils that are “infectious” because they convert normally folded molecules of PrP to the same pathological form

#### pdf

cannot see any pdfs

#### Annotation 1481807629580

 #biochem #biology #cell another remarkable feature of prions. These protein molecules can form several distinctively different types of amyloid fibrils from the same polypeptide chain. Moreover, each type of aggregate can be infectious, forcing normal protein molecules to adopt the same type of abnormal structure. Thus, several different “strains” of infectious particles can arise from the same polypeptide chain.

#### pdf

cannot see any pdfs

#### Annotation 1481809202444

 #biochem #biology #cell Eukaryotic cells, for example, store many different peptide and protein hormones that they will secrete in specialized “secretory granules,” which package a high concentra- tion of their cargo in dense cores with a regular structure (see Figure 13–65). We now know that these structured cores consist of amyloid fibrils, which in this case have a structure that causes them to dissolve to release soluble cargo after being secreted by exocytosis to the cell exterior

#### pdf

cannot see any pdfs

#### Annotation 1481810775308

 #biochem #biology #cell Many bacteria use the amyloid structure in a very different way, secreting proteins that form long amy- loid fibrils projecting from the cell exterior that help to bind bacterial neighbors into biofilms

#### pdf

cannot see any pdfs

#### Annotation 1481812348172

 #biochem #biology #cell these biofilms help bacteria to survive in adverse environments (including in humans treated with antibiotics), new drugs that specifically disrupt the fibrous networks formed by bacterial amyloids have promise for treating human infections

#### pdf

cannot see any pdfs

#### Annotation 1481813921036

 #biochem #biology #cell new experiments reveal that a large set of low com- plexity domains can form amyloid fibers that have functional roles in both the cell nucleus and the cell cytoplasm

#### pdf

cannot see any pdfs

#### Annotation 1481815493900

 #biochem #biology #cell these newly discovered structures are held together by weaker noncovalent bonds and readily dissociate in response to signals—hence their name reversible amyloids.

#### pdf

cannot see any pdfs

#### Annotation 1481817066764

 #biochem #biology #cell hormones of the endocrine system, such as glucagon and calcitonin, are efficiently stored as short amyloid fibrils, which dissociate when they reach the cell exterior.

#### pdf

cannot see any pdfs

#### Annotation 1481818639628

 #deeplearning #neuralnetworks Dep ending on the structure of the problem, it ma y not b e p ossible to design a unique mapping from to . A B If A is taller than it is wide, then it is p ossible for this equation to hav e no solution. If A is wider than it is tall, then there could b e multiple p ossible solutions. The Mo ore-P enrose pseudoin verse allo ws us to mak e some headwa y in these cases. The pseudoinv erse of is deﬁned as a matrix A A + = lim α 0 ( A A I + α ) − 1 A

#### pdf

cannot see any pdfs

#### Annotation 1481822309644

 #deeplearning #neuralnetworks Practical algorithms for computing the pseudoinv erse are not based on this deﬁni- tion, but rather the formula A + = V D + U , (2.47) where U , D and V are the singular v alue decomp osition of A , and the pseudoin verse D + of a diagonal matrix D is obtained by taking the recipro cal of its non-zero elemen ts then taking the transp ose of the resulting matrix

#### pdf

cannot see any pdfs

#### Annotation 1481823882508

 #deeplearning #neuralnetworks When A has more columns than rows, then solving a linear equation using the pseudoin v erse provides one of the man y p ossible solutions. Speciﬁcally , it pro vides the solution x = A + y with minimal Euclidean norm ||x||2 among all p ossible solutions

#### pdf

cannot see any pdfs

#### Annotation 1481825455372

 #deeplearning #neuralnetworks When A has more rows than columns, it is p ossible for there to b e no solution. In this case, using the pseudoinv erse gives us the x for which Ax is as close as p ossible to in terms of Euclidean norm y || − || Ax y 2

#### pdf

cannot see any pdfs

#### Annotation 1481827028236

 #deeplearning #neuralnetworks The trace operator gives the sum of all of the diagonal en tries of a matrix: T r( ) = A i A i,i

#### pdf

cannot see any pdfs

#### Annotation 1481828601100

 #deeplearning #neuralnetworks the trace op erator provides an alternativ e w a y of writing the F rob enius norm of a matrix: ||A||F = Tr( AAT )

#### pdf

cannot see any pdfs

#### Annotation 1481830173964

 #deeplearning #neuralnetworks the trace op erator is in v arian t to the transp ose op erator: T r(A) = T r(AT )

#### pdf

cannot see any pdfs

#### Annotation 1481831746828

 #deeplearning #neuralnetworks The trace of a square matrix comp osed of many factors is also in v arian t to mo ving the last factor into the ﬁrst p osition, if the shap es of the corresp onding matrices allo w the resulting pro duct to b e deﬁned: T r( ) = T r( ) = T r( ) AB C C AB B C A

#### pdf

cannot see any pdfs

#### Annotation 1481833319692

 #deeplearning #neuralnetworks This inv ariance to cyclic p erm utation holds even if the resulting pro duct has a diﬀeren t shap e. F or example, for A ∈ R m n × and B ∈ R n m × , w e ha v e T r(AB ) = T r( BA)

#### pdf

cannot see any pdfs

#### Annotation 1481834892556

 #deeplearning #neuralnetworks a scalar is its own trace: a = T r ( a

#### pdf

cannot see any pdfs

#### Annotation 1481836465420

 #deeplearning #neuralnetworks The determinant of a square matrix, denoted det ( A ) , is a function mapping matrices to real scalars. The determinant is equal to the pro duct of all the eigen v alues of the matrix.

#### pdf

cannot see any pdfs

#### Annotation 1481838038284

 #deeplearning #neuralnetworks The absolute v alue of the determinant can b e thought of as a measure of how m uc h m ultiplication by the matrix expands or con tracts space.

#### pdf

cannot see any pdfs

#### Annotation 1481839611148

 #deeplearning #neuralnetworks If the determinant is 0, then space is contracted completely along at least one dimension, causing it to lose all of its v olume. If the determinant is 1, then the transformation preserves volume

#### pdf

cannot see any pdfs

#### Annotation 1481841184012

 #deeplearning #neuralnetworks One simple mac hine learning algorithm, principal components analysis or PCA can b e deriv ed using only knowledge of basic linear algebra

#### pdf

cannot see any pdfs

#### Annotation 1481842756876

 #deeplearning #neuralnetworks Lossy compression means storing the p oints in a wa y that requires less memory but ma y lose some precision

#### pdf

cannot see any pdfs

#### Annotation 1481844329740

 #deeplearning #neuralnetworks PCA is deﬁned b y our c hoice of the deco ding function. Sp eciﬁcally , to mak e the deco der very simple, we choose to use matrix m ultiplication to map the co de back in to R n . Let , where g ( ) = c D c D ∈ R n l × is the matrix deﬁning the deco ding

#### pdf

cannot see any pdfs

#### Annotation 1481845902604

 #deeplearning #neuralnetworks T o k eep the enco ding problem easy , PCA constrains the colum ns of D to b e orthogonal to eac h other.

#### pdf

cannot see any pdfs

#### Annotation 1481847475468

 #deeplearning #neuralnetworks how to generate the optimal co de p oint c ∗ for eac h input p oint x . One w a y to do this is to minimize the distance b etw een the input p oint x and its reconstruction, g ( c ∗ )

#### pdf

cannot see any pdfs

Article 1481851669772

4.3. Implications for Financial Analysis

A company’s estimates for doubtful accounts and/or for warranty expenses can affect its reported net income. Similarly, a company’s choice of depreciation or amortisation method, estimates of assets’ useful lives, and estimates of assets’ residual values can affect reported net income. These are only a few of the choices and estimates that affect a company’s reported net income. As with revenue recognition policies, a company’s choice of expense recognition can be characterized by its relative conservatism. A policy that results in recognition of expenses later rather than sooner is considered less conservative. In addition, many items of expense require the company to make estimates that can significantly affect net income. Analysis of a company’s financial statements, and particularly comparison of one company’s financial statements with those of another, requires an understanding of differences in these estimates and their potential impact. If, for example, a company shows a significant ye

#### Flashcard 1481852980492

Tags
Question
A company’s estimates for doubtful accounts and/or for warranty expenses can affect its reported [...].
net income.

status measured difficulty not learned 37% [default] 0
4.3. Implications for Financial Analysis
A company’s estimates for doubtful accounts and/or for warranty expenses can affect its reported net income. Similarly, a company’s choice of depreciation or amortisation method, estimates of assets’ useful lives, and estimates of assets’ residual values can affect reported net income. These a

#### Flashcard 1481855339788

Tags
Question
As with revenue recognition policies, a company’s choice of expense recognition can be characterized by its relative [...]
conservatism

status measured difficulty not learned 37% [default] 0
4.3. Implications for Financial Analysis
on or amortisation method, estimates of assets’ useful lives, and estimates of assets’ residual values can affect reported net income. These are only a few of the choices and estimates that affect a company’s reported net income. <span>As with revenue recognition policies, a company’s choice of expense recognition can be characterized by its relative conservatism. A policy that results in recognition of expenses later rather than sooner is considered less conservative. In addition, many items of expense require the company to make estimates that

#### Flashcard 1481857699084

Tags
Question
A policy that results in recognition of expenses later rather than sooner is considered [...] conservative.
less

status measured difficulty not learned 37% [default] 0
4.3. Implications for Financial Analysis
me. These are only a few of the choices and estimates that affect a company’s reported net income. As with revenue recognition policies, a company’s choice of expense recognition can be characterized by its relative conservatism. <span>A policy that results in recognition of expenses later rather than sooner is considered less conservative. In addition, many items of expense require the company to make estimates that can significantly affect net income. Analysis of a company’s financial statements, and particularly compari

#### Annotation 1481860058380

 #cfa-level-1 #expense-recognition #reading-25-understanding-income-statement In addition, many items of expense require the company to make estimates that can significantly affect net income. Analysis of a company’s financial statements, and particularly comparison of one company’s financial statements with those of another, requires an understanding of differences in these estimates and their potential impact.

4.3. Implications for Financial Analysis
13; As with revenue recognition policies, a company’s choice of expense recognition can be characterized by its relative conservatism. A policy that results in recognition of expenses later rather than sooner is considered less conservative. <span>In addition, many items of expense require the company to make estimates that can significantly affect net income. Analysis of a company’s financial statements, and particularly comparison of one company’s financial statements with those of another, requires an understanding of differences in these estimates and their potential impact. If, for example, a company shows a significant year-to-year change in its estimates of uncollectible accounts as a percentage of sales, warranty expenses as a percentage of

#### Annotation 1481861631244

 #cfa-level-1 #expense-recognition #reading-25-understanding-income-statement If, for example, a company shows a significant year-to-year change in its estimates of uncollectible accounts as a percentage of sales, warranty expenses as a percentage of sales, or estimated useful lives of assets, the analyst should seek to understand the underlying reasons. Do the changes reflect a change in business operations (e.g., lower estimated warranty expenses reflecting recent experience of fewer warranty claims because of improved product quality)? Or are the changes seemingly unrelated to changes in business operations and thus possibly a signal that a company is manipulating estimates in order to achieve a particular effect on its reported net income?

4.3. Implications for Financial Analysis
ncome. Analysis of a company’s financial statements, and particularly comparison of one company’s financial statements with those of another, requires an understanding of differences in these estimates and their potential impact. <span>If, for example, a company shows a significant year-to-year change in its estimates of uncollectible accounts as a percentage of sales, warranty expenses as a percentage of sales, or estimated useful lives of assets, the analyst should seek to understand the underlying reasons. Do the changes reflect a change in business operations (e.g., lower estimated warranty expenses reflecting recent experience of fewer warranty claims because of improved product quality

#### Flashcard 1481869233420

Tags
Question
When possible, the [...] of differences in expense recognition policies and estimates can facilitate more meaningful comparisons with a single company’s historical performance or across a number of companies.
monetary effect

status measured difficulty not learned 37% [default] 0
4.3. Implications for Financial Analysis
? Information about a company’s accounting policies and significant estimates are described in the notes to the financial statements and in the management discussion and analysis section of a company’s annual report. <span>When possible, the monetary effect of differences in expense recognition policies and estimates can facilitate more meaningful comparisons with a single company’s historical performance or across a number of companies. An analyst can use the monetary effect to adjust the reported expenses so that they are on a comparable basis. Even when the monetary effects of differences in policies and

#### Flashcard 1481871592716

Tags
Question
An analyst can use the [...] to adjust the reported expenses so that they are on a comparable basis.
monetary effect

status measured difficulty not learned 37% [default] 0
4.3. Implications for Financial Analysis
al report. When possible, the monetary effect of differences in expense recognition policies and estimates can facilitate more meaningful comparisons with a single company’s historical performance or across a number of companies. <span>An analyst can use the monetary effect to adjust the reported expenses so that they are on a comparable basis. Even when the monetary effects of differences in policies and estimates cannot be calculated, it is generally possible to characterize the relative conservatism of the poli

#### Annotation 1481909865740

 #deeplearning #neuralnetworks There are three p ossible sources of uncertain t y: 1. Inheren t stochasticit y in the system b eing mo deled. F or example, most in terpretations of quantum mechanics describ e the dynamics of subatomic particles as b eing probabilistic. W e can also create theoretical scenarios that w e p ostulate to ha v e random dynamics, such as a hypothetical card game where w e assume that the cards are truly sh uﬄed in to a random order. 2. Incomplete observ ability . Ev en deterministic systems can app ear sto chastic when w e cannot observ e all of the v ariables that drive the b ehavior of the system. F or example, in the Mont y Hall problem, a game sho w con testan t is ask ed to choose b etw een three do ors and wins a prize held b ehind the c hosen do or. T w o do ors lead to a goat while a third leads to a car. The outcome giv en the contestan t’s c hoice is deterministic, but from the con testan t’s p oin t of view, the outcome is uncertain. 3. Incomplete mo deling. When we use a mo del that must discard some of the information we hav e observ ed, the discarded information results in uncertain t y in the mo del’s predictions.

#### pdf

cannot see any pdfs

#### Annotation 1481911438604

 #deeplearning #neuralnetworks Whe n we sa y that an outcome has a probability p of o ccurring, it means that if we rep eated the exp erimen t (e.g., dra w a hand of cards) inﬁnitely many times, then prop ortion p of the rep etitions would result in that outcome.

#### pdf

cannot see any pdfs

#### Annotation 1481913011468

 #deeplearning #neuralnetworks In the case of the do ctor diagnosing the patient, we use probability to represent a degree of b elief , with 1 indicating absolute certain t y that the patien t has the ﬂu and 0 indicating absolute certain t y that the patient do es not hav e the ﬂu.

#### pdf

cannot see any pdfs

#### Annotation 1481914584332

 #deeplearning #neuralnetworks The former kind of probability , related directly to the rates at which even ts o ccur, is kno wn as frequen tist probability , while the latter, related to qualitative levels of certain t y , is known as Bay esian probabilit y

#### pdf

cannot see any pdfs

#### Annotation 1481916157196

 #deeplearning #neuralnetworks A random v ariable is a v ariable that can take on diﬀerent v alues randomly

#### pdf

cannot see any pdfs

#### Annotation 1481917730060

 #deeplearning #neuralnetworks W e t ypically denote the random v ariable itself with a low er case letter in plain typeface, and the v alues it can take on with low er case script letters

#### pdf

cannot see any pdfs

#### Annotation 1481919302924

 #deeplearning #neuralnetworks F or vector-v alued v ariables, we would write the random v ariable as x and one of its v alues as x

#### pdf

cannot see any pdfs

#### Annotation 1481921137932

 #deeplearning #neuralnetworks On its o wn, a random v ariable is just a description of the states that are p ossible; it m ust b e coupled with a probability distribution that sp eciﬁes how likely each of these states are.

#### pdf

cannot see any pdfs

#### Annotation 1481922710796

 #deeplearning #neuralnetworks A probabilit y distribution is a description of how lik ely a random v ariable or set of random v ariables is to take on each of its p ossible states.

#### pdf

cannot see any pdfs

#### Annotation 1481924283660

 #deeplearning #neuralnetworks A probability distribution ov er discrete v ariables may b e describ ed using a proba- bilit y mass function (PMF)

#### pdf

cannot see any pdfs

#### Annotation 1481925856524

 #deeplearning #neuralnetworks The probabilit y mass function maps from a state of a random v ariable to the probabilit y of that random v ariable taking on that state

#### pdf

cannot see any pdfs

#### Annotation 1481927429388

 #deeplearning #neuralnetworks Probabilit y mass functions can act on many v ariables at the same time. Suc h a probability distribution ov er man y v ariables is known as a join t probabilit y distribution

#### pdf

cannot see any pdfs

#### Annotation 1481929002252

 #deeplearning #neuralnetworks T o b e a probability mass function on a random v ariable x , a function P m ust satisfy the follo wing prop erties: • The domain of P must b e the set of all p ossible states of x. • ∀x ∈ x , 0 ≤ P ( x ) ≤ 1 . An imp ossible ev en t has probabilit y and no state can 0 b e less probable than that. Likewise, an ev en t that is guaran teed to happ en has probabilit y , and no state can ha v e a greater c hance of o ccurring. 1 • $$\sum_{x \in x}P(x) = 1$$ . W e refer to this prop erty as b eing normalized . Without this prop ert y , we could obtain probabilities greater than one by computing the probabilit y of one of man y ev en ts o ccurring.

#### pdf

cannot see any pdfs

#### Annotation 1481930575116

 #deeplearning #neuralnetworks W e can place a uniform distribution on x —that is, make each of its states equally lik ely—b y setting its probabilit y mass function to P (x = x i ) = 1/k

#### pdf

cannot see any pdfs

#### Annotation 1481932147980

 #deeplearning #neuralnetworks T o b e a probabilit y densit y function, a function p m ust satisfy the follo wing prop erties: • The domain of m ust b e the set of all p ossible states of x. p • ∀ ∈ ≥ ≤ x x , p x ( ) 0 ( ) . p Note that we do not require x 1 . • p x dx ( ) = 1

#### pdf

cannot see any pdfs

#### Annotation 1481933982988

 #deeplearning #neuralnetworks The “;” notation means “parametrized b y”; w e consider x to b e the argument of the function, while a and b are parameters that deﬁne the function

#### pdf

cannot see any pdfs

#### Annotation 1481935555852

 #deeplearning #neuralnetworks W e often denote that x follo ws the uniform distribution on [ a, b ] b y writing x . ∼ U a, b (

#### pdf

cannot see any pdfs

#### Annotation 1481937128716

 #deeplearning #neuralnetworks The probability distribution o v er the subset is kno wn as the marginal probability distribution.

#### pdf

cannot see any pdfs

#### Annotation 1481938701580

 #deeplearning #neuralnetworks F or example, supp ose w e ha v e discrete random v ariables x and y , and we know P , ( x y . W e can ﬁnd x with the : ) P ( ) sum rule ∀ ∈ x x x , P ( = ) = x y P x, y . ( = x y =

#### pdf

cannot see any pdfs

#### Annotation 1481940274444

 #deeplearning #neuralnetworks The name “marginal probabilit y” comes from the pro cess of computing marginal probabilities on pap er. When the v alues of P ( x y , ) are written in a grid with diﬀeren t v alues of x in rows and diﬀerent v alues of y in columns, it is natural to sum across a row of the grid, then write P ( x ) in the margin of the pap er just to the righ t of the ro w

#### pdf

cannot see any pdfs

#### Annotation 1481941847308

 #deeplearning #neuralnetworks In many cases, we are inte rested in the probabilit y of some even t, given that some other ev en t has happ ened. This is called a conditional probability

#### pdf

cannot see any pdfs

#### Annotation 1481943420172

 #deeplearning #neuralnetworks Computing the consequences of an action is called making an in terv en tion query . In terv en tion queries are the domain of causal mo deling

#### pdf

cannot see any pdfs

#### Annotation 1481946565900

 #deeplearning #neuralnetworks An y join t probabilit y distribution o v er man y random v ariables ma y be decomp osed in to conditional distributions o v er only one v ariable: P ( x (1) , . . . , x ( ) n ) = ( P x (1) )Π n i =2 P ( x ( ) i | x (1) , . . . , x ( 1) i − ) . (3.6) This observ ation is known as the c hain rule or pro duct rule of probabilit y

#### pdf

cannot see any pdfs

#### Annotation 1481948663052

 #deeplearning #neuralnetworks T wo random v ariables x and y are indep enden t if their probabilit y distribution can b e expressed as a pro duct of tw o factors, one in v olving only x and one in v olving only y: ∀ ∈ ∈ x x , y y x y x y (3.7) , p ( = x, = ) = ( y p = ) ( x p = ) y

#### pdf

cannot see any pdfs

#### Annotation 1481950498060

 #deeplearning #neuralnetworks T wo random v ariables x and y are conditionally indep enden t giv en a random v ariable z if the conditional probability distribution o v er x and y factorizes in this w a y for ev ery v alue of z: ∀ ∈ ∈ ∈ | | | x x , y y , z z x y , p ( = x, = y z x = ) = ( z p = x z y = ) ( z p = y z = ) z

#### pdf

cannot see any pdfs

#### Annotation 1481952070924

 #deeplearning #neuralnetworks W e can denote indep endence and conditional indep endence with compact notation: x y ⊥ means that x and y are indep enden t, while x y z ⊥ | means that x and y are conditionally indep enden t giv en z

#### pdf

cannot see any pdfs

#### Annotation 1481953643788

 #deeplearning #neuralnetworks The exp ectation or exp ected v alue of some function f ( x ) with resp ect to a probabilit y distribution P ( x ) is the a v erage or mean v alue that f tak es on when x is dra wn from . F or discrete v ariables this can be computed with a summation: Ex~P[f(x)]=$$\sum_{x} P(x)f(x)$$

#### pdf

cannot see any pdfs

#### Annotation 1481955216652

 #deeplearning #neuralnetworks When the iden tit y of the distribution is clear from the context, w e ma y simply write the name of the random v ariable that the exp ectation is ov er, as in E x [ f ( x )]

#### pdf

cannot see any pdfs

#### Annotation 1481956789516

 #deeplearning #neuralnetworks Exp ectations are linear, for example, E x [ ( ) + ( )] = αf x β g x α E x [ ( )] + f x β E x [ ( )] g x

#### pdf

cannot see any pdfs

#### Annotation 1481958362380

 #deeplearning #neuralnetworks The v ariance giv es a measure of how muc h the v alues of a function of a random v ariable x v ary as we sample diﬀeren t v alues of x from its probability distribution: V ar( ( )) = f x E ( ( ) [ ( )]) f x − E f x 2

#### pdf

cannot see any pdfs

#### Annotation 1481959935244

 #deeplearning #neuralnetworks The co v ariance giv es some sense of how muc h t w o v alues are linearly related to eac h other, as w ell as the scale of these v ariables: Co v( ( ) ( )) = [( ( ) [ ( )]) ( ( ) [ ( )])] f x , g y E f x − E f x g y − E g y

#### pdf

cannot see any pdfs

#### Annotation 1481961508108

 #deeplearning #neuralnetworks High absolute v alues of the cov ariance mean that the v alues change very muc h and are b oth far from their resp ectiv e means at the same time

#### pdf

cannot see any pdfs

#### Annotation 1481963080972

 #deeplearning #neuralnetworks If the sign of the co v ariance is p ositiv e, then b oth v ariables tend to tak e on relatively high v alues sim ultaneously . If the sign of the co v ariance is negative, then one v ariable tends to tak e on a relativ ely high v alue at the times that the other takes on a relatively lo w v alue and vice v ersa.

#### pdf

cannot see any pdfs

#### Annotation 1481964653836

 #deeplearning #neuralnetworks Other measures such as correlation normalize the con tribution of each v ariable in order to measure only how m uc h the v ariables are related, rather than also b eing aﬀected b y the scale of the separate v ariables

#### pdf

cannot see any pdfs

#### Annotation 1481966226700

 #deeplearning #neuralnetworks They are related b ecause t w o v ariables that are indep endent hav e zero co v ariance, and tw o v ariables that hav e non-zero cov ariance are dep endent

#### pdf

cannot see any pdfs

#### Annotation 1481967799564

 #deeplearning #neuralnetworks Indep endence is a stronger requirement than zero co v ariance, b ecause indep endence also excludes nonlinear relationships.

#### pdf

cannot see any pdfs

#### Annotation 1481969372428

 #deeplearning #neuralnetworks The co v ariance matrix of a random vector x ∈ R n is an n n × matrix, suc h that Cov(x)i,j = Cov( x i , x j ) . (3.14) The diagonal elements of the covariance give the variance: Cov( x i , x i ) = Var( x i )

#### pdf

cannot see any pdfs

#### Annotation 1481970945292

 #deeplearning #neuralnetworks The Bernoulli distribution is a distribution ov er a single binary random v ariable. It is controlled by a single parameter φ ∈ [0 , 1] , whic h gives the probability of the random v ariable b eing equal to 1. It has the following prop erties: P (x = 1) = φ P (x = 0) = 1-φ P (x = x ) = φ x (1 − φ) 1 − x E x [x] = φ V ar (x) = φ (1− φ)

#### pdf

cannot see any pdfs

#### Annotation 1481972518156

 #deeplearning #neuralnetworks The m ultinoulli or categorical distribution is a distribution o v er a single discrete v ariable with k diﬀeren t states, where k is ﬁnite.

#### pdf

cannot see any pdfs

#### Annotation 1481974091020

 #deeplearning #neuralnetworks The multinoulli distribution is a sp ecial case of the m ultinomial distribution. A m ultinomial distribution is the distribution ov er vectors in { 0 , . . . , n } k represen ting ho w many times each of the k categories is visited when n samples are drawn from a multinoulli distribution. Man y texts use the term “multinomial” to refer to multinoulli distributions without clarifying that they refer only to the case. n = 1

#### pdf

cannot see any pdfs

#### Annotation 1481975663884

 #deeplearning #neuralnetworks The most commonly used distribution o v er real n um b ers is the normal distribu- tion , also kno wn as the : Gaussian distribution N ( ; x µ, σ 2 ) = 1 2 π σ 2 exp − 1 2 σ 2 ( ) x µ − 2

#### pdf

cannot see any pdfs

#### Annotation 1481977236748

 #deeplearning #neuralnetworks When w e need to frequen tly ev aluate the PDF with diﬀeren t parameter v alues, a more eﬃcient w a y of parametrizing the distribution is to use a parameter β ∈ (0 , ∞ ) to control the precision or in v erse v ariance of the distribution: N ( ; x µ, β − 1 ) = β 2 π exp − 1 2 β x µ ( − ) 2

#### pdf

cannot see any pdfs

#### Annotation 1481979071756

 #deeplearning #neuralnetworks The cen tral limit theorem sho ws that the sum of many indep en- den t random v ariables is appro ximately normally distributed

#### pdf

cannot see any pdfs

#### Annotation 1481980644620

 #deeplearning #neuralnetworks out of all p ossible probability distributions with the same v ariance, the normal distribution enco des the maxim um amount of uncertaint y ov er the real num b ers

#### pdf

cannot see any pdfs

#### Annotation 1481982217484

 #deeplearning #neuralnetworks W e can th us think of the normal distribution as b eing the one that inserts the least amoun t of prior kno wledge into a mo del

#### pdf

cannot see any pdfs

#### Annotation 1481984052492

 #deeplearning #neuralnetworks The normal distribution generalizes to R n , in whic h case it is known as the m ultiv ariate normal distribution . It may b e parametrized with a p ositive deﬁnite symmetric matrix : Σ N ( ; ) = x µ , Σ 1 (2 ) π n det( ) Σ exp − 1 2 ( ) x µ − Σ − 1 ( ) x µ −

#### pdf

cannot see any pdfs

#### Annotation 1481985625356

 #deeplearning #neuralnetworks An even simpler v ersion is the isotropic Gaussian distribution, whose cov ariance matrix is a scalar times the iden tit y matrix

#### pdf

cannot see any pdfs

#### Annotation 1481987198220

 #deeplearning #neuralnetworks In the context of deep learning, w e often wan t to hav e a probability distribution with a sharp point at x = 0 . T o accomplish this, w e can use the exp onen tial distribution : p x λ λ ( ; ) = 1 x ≥ 0 exp ( ) − λx . (3.25) The exp onen tial distribution uses the indicator function 1 x ≥ 0 to assign probabilit y zero to all negativ e v alues of . x

#### pdf

cannot see any pdfs

#### Annotation 1481988771084

 #deeplearning #neuralnetworks A closely related probabilit y distribution that allo ws us to place a sharp p eak of probabilit y mass at an arbitrary p oin t is the µ Laplace distribution Laplace( ; ) = x µ, γ 1 2 γ exp − | − | x µ γ

#### pdf

cannot see any pdfs

#### Annotation 1481990343948

 #deeplearning #neuralnetworks In some cases, we wish to sp ecify that all of the mass in a probabilit y distribution clusters around a single p oin t. This can b e accomplished b y deﬁning a PDF using the Dirac delta function, : p(x) = δ(x-µ) The Dirac delta function is deﬁned such that it is zero-v alued everywhere except 0, y et integrates to 1.

#### pdf

cannot see any pdfs

#### Annotation 1481991916812

 #deeplearning #neuralnetworks mathematical ob ject called a generalized function that is deﬁned in terms of its prop erties when integrated

#### pdf

cannot see any pdfs

#### Annotation 1481993489676

 #deeplearning #neuralnetworks W e can think of the Dirac delta function as b eing the limit p oin t of a series of functions that put less and less mass on all p oints other than zero.

#### pdf

cannot see any pdfs

#### Annotation 1481995062540

 #deeplearning #neuralnetworks By deﬁning p ( x ) to be δ shifted b y − µ w e obtain an inﬁnitely narrow and inﬁnitely high p eak of probabilit y mass where . x µ =

#### pdf

cannot see any pdfs

#### Annotation 1481996635404

 #deeplearning #neuralnetworks A common use of the Dirac delta distribution is as a comp onent of an empirical distribution , ˆ p ( ) = x 1 m m i =1 δ ( x x − ( ) i ) (3.28) whic h puts probability mass 1 m on eac h of the m p oin ts x (1) , . . . , x ( ) m forming a giv en dataset or collection of samples

#### pdf

cannot see any pdfs

#### Annotation 1481998208268

 #deeplearning #neuralnetworks Another imp ortant p ersp ective on the empirical distribution is that it is the probabilit y density that maximizes the likelihoo d of the training data

#### pdf

cannot see any pdfs

#### Annotation 1482001878284

 #deeplearning #neuralnetworks One common w a y of com bining distributions is to construct a mixture distribution . A mixture distribution is made up of sev eral comp onen t distributions. On eac h trial, the choice of whic h comp onent distribution generates the sample is determined by sampling a comp onent iden tit y from a m ultinoulli distribution: P ( ) = x i P i P i ( = c ) ( = x c | ) (3.29) where c is the multinoulli distribution ov er comp onent identities

#### pdf

cannot see any pdfs

#### Annotation 1482003451148

 #deeplearning #neuralnetworks A latent v ariable is a random v ariable that w e cannot observe directly .

#### pdf

cannot see any pdfs

#### Annotation 1482005024012

 #deeplearning #neuralnetworks A very p ow erful and common type of mixture model is the Gaussian mixture mo del, in whic h the comp onen ts p ( x | c = i ) are Gaussians. Each comp onent has a separately parametrized mean µ ( ) i and cov ariance Σ ( ) i

#### pdf

cannot see any pdfs

#### Annotation 1482006859020

 #deeplearning #neuralnetworks A Gaussian mixture mo del is a univ ersal approximator of densities, in the sense that an y smo oth densit y can b e appro ximated with an y sp eciﬁc, non-zero amoun t of error by a Gaussian mixture model with enough comp onen ts

#### pdf

cannot see any pdfs

#### Annotation 1482008431884

 #deeplearning #neuralnetworks Certain functions arise often while working with probability distributions, especially the probabilit y distributions used in deep learning models. One of these functions is the : logistic sigmoid σ(x) = 1/(1 + exp(− x))

#### pdf

cannot see any pdfs

#### Annotation 1482010004748

 #deeplearning #neuralnetworks The logistic sigmoid is commonly used to pro duce the φ parameter of a Bernoulli

#### pdf

cannot see any pdfs

#### Annotation 1482011577612

 #deeplearning #neuralnetworks an isotropic co v ariance matrix, meaning it has the same amount of v ariance in each direction

#### pdf

cannot see any pdfs

#### Annotation 1482013150476

 #deeplearning #neuralnetworks diagonal co v ariance matrix, meaning it can control the v ariance separately along each axis-aligned direction.

#### pdf

cannot see any pdfs

#### Annotation 1482014723340

 #deeplearning #neuralnetworks The third comp onen t has a full-rank cov ariance matrix, allowing it to control the v ariance separately along an arbitrary basis of directions

#### pdf

cannot see any pdfs

#### Annotation 1482016558348

 #deeplearning #neuralnetworks The 3.3 sigmoid function saturates when its argument is v ery p ositiv e or very negative, meaning that the function b ecomes v ery ﬂat and insensitive to small changes in its input

#### pdf

cannot see any pdfs

#### Annotation 1482018131212

 #deeplearning #neuralnetworks Another commonly encountered function is the softplus function ( , Dugas et al. 2001 ): ζ(x) = log (1 + exp(x)) The softplus function can be useful for pro ducing the β or σ parameter of a normal distribution b ecause its range is (0 , ∞ )

#### pdf

cannot see any pdfs

#### Annotation 1482019704076

 #deeplearning #neuralnetworks The name of the softplus function comes from the fact that it is a smo othed or “softened” version of x + = max(0 ) , x

#### pdf

cannot see any pdfs

#### Annotation 1482021276940

 #deeplearning #neuralnetworks σ x ( ) = exp( ) x exp( ) + exp(0) x (3.33) d dx σ x σ x σ x ( ) = ( )(1 − ( )) (3.34) 1 ( ) = ( ) − σ x σ − x (3.35) log ( ) = ( ) σ x − ζ − x (3.36) d dx ζ x σ x ( ) = ( ) (3.37) ∀ ∈ x (0 1) , , σ − 1 ( ) = log x x 1 − x (3.38) ∀ x > , ζ 0 − 1 ( ) = log (exp( ) 1) x x − (3.39) ζ x ( ) = x −∞ σ y dy ( ) (3.40) ζ x ζ x x ( ) − ( − ) =

#### pdf

cannot see any pdfs

#### Annotation 1482022849804

 #deeplearning #neuralnetworks The function σ − 1 ( x ) is called the logit in statistics, but this term is more rarely used in mac hine learning

#### pdf

cannot see any pdfs

#### Annotation 1482024422668

 #deeplearning #neuralnetworks The softplus 3.41 function is intended as a smo othed v ersion of the p ositiv e part function, x + = max { 0 , x }

#### pdf

cannot see any pdfs

#### Annotation 1482025995532

 #deeplearning #neuralnetworks Just as x can b e recov ered from its p ositive part and negativ e part via the iden tit y x + − x − = x , it is also p ossible to reco v er x using the same relationship b et w een and ζ (x), ζ ( −x)

#### pdf

cannot see any pdfs

#### Annotation 1482027830540

 #deeplearning #neuralnetworks Measure theory pro vides a rigorous wa y of describing that a set of p oin ts is negligibly small. Such a set is said to hav e measure zero .

#### pdf

cannot see any pdfs

#### Annotation 1482029403404

 #deeplearning #neuralnetworks F or our purp oses, it is suﬃcient to understand the intuition that a set of measure zero o ccupies no v olume in the space w e are measuring. F or example, within R 2 , a line has measure zero, while a ﬁlled p olygon has positive measure

#### pdf

cannot see any pdfs

#### Annotation 1482030976268

 #deeplearning #neuralnetworks Another useful term from measure theory is almost ev erywhere . A prop ert y that holds almost ev erywhere holds throughout all of space except for on a set of

#### pdf

cannot see any pdfs

#### Annotation 1482032549132

 #deeplearning #neuralnetworks Supp ose we ha v e t w o random v ariables, x and y , suc h that y = g ( x ) , where g is an inv ertible, con- tin uous, diﬀeren tiable transformation. One migh t exp ect that p y ( y ) = p x ( g − 1 ( y )) . This is actually not the case.

#### pdf

cannot see any pdfs

#### Annotation 1482034121996

 #deeplearning #neuralnetworks The basic intuition b ehind information theory is that learning that an unlik ely ev en t has occurred is more informative than learning that a lik ely ev ent has o ccurred

#### pdf

cannot see any pdfs

#### Annotation 1482035694860

 #deeplearning #neuralnetworks W e would like to quantify information in a w a y that formalizes this intuition. Sp eciﬁcally , • Lik ely ev en ts should ha v e lo w information con ten t, and in the extreme case, ev en ts that are guaranteed to happen should ha v e no information conten t whatso ev er. • Less lik ely ev en ts should ha v e higher information con ten t. • Indep enden t ev en ts should ha v e additiv e information. F or example, ﬁnding out that a tossed coin has come up as heads t wice should conv ey t wice as m uc h information as ﬁnding out that a tossed coin has come up as heads once.

#### pdf

cannot see any pdfs

#### Annotation 1482037267724

 #deeplearning #neuralnetworks In order to satisfy all three of these prop erties, w e deﬁne the self-information of an event x = x to be I x = −log P(x)

#### pdf

cannot see any pdfs

#### Annotation 1482038840588

 #deeplearning #neuralnetworks One nat is the amount of

#### pdf

cannot see any pdfs

#### Annotation 1482040413452

 #deeplearning #neuralnetworks information gained by observing an even t of probability 1 e . Other texts use base-2 logarithms and units called bits or shannons ; information measured in bits is just a rescaling of information measured in nats

#### pdf

cannot see any pdfs

#### Annotation 1482041986316

 #bayes #programming #r #statistics This chapter introduces the methods we will use for producing accurate approximations to Bayesian posterior distributions for realistic applications. The class of methods is called Markov chain Monte Carlo (MCMC)

#### pdf

cannot see any pdfs

#### Annotation 1482044345612

 #bayes #programming #r #statistics The method described in this chapter assumes that the prior distribution is specified by a function that is easily evaluated. This simply means that if you specify a value for θ, then the value of p(θ) is easily determined, especially by a computer

#### pdf

cannot see any pdfs

#### Annotation 1482045918476

 #matlab #programming Recursive functions are usually written in this way: an if statement handles the general recursive definition; the else part handles the special case (n = 1)

#### pdf

cannot see any pdfs

#### Annotation 1482047491340

 #matlab #programming where the number of repetitions must be determined in advance, is sometimes called determinate repetition.

#### pdf

cannot see any pdfs

#### Annotation 1482049064204

 #matlab #programming it often happens that the condition to end a loop is only satisfied during the execution of the loop itself . Such a structure is called indeterminate.

#### pdf

cannot see any pdfs

#### Annotation 1482050637068

 #matlab #programming If there are a number of different conditions to stop a while loop you may be tempted to use a for with the number of repetitions set to some accepted cut-off value (or even Inf) but enclosing if statements which break outofthe for when the various conditions are met. Why is this not regarded as the best programming style? The reason is simply that when you read the code months later you will have to wade through the whole loop to find all the conditions to end it, rather than see them all paraded at the start of the loop in the while clause

#### pdf

cannot see any pdfs

#### Annotation 1482052209932

 #matlab #programming Graphs (in 2-D) are drawn with the plot statement. In its simplest form, it takes a single vector argument as in plot(y)

#### pdf

cannot see any pdfs

#### Annotation 1482053782796

 #matlab #programming plot(y). In this case the elements of y are plotted against their indexes, e.g., plot(rand(1, 20)) plots 20 random num- bers against the integers 1–20, joining successive points with straight lines,

#### pdf

cannot see any pdfs

#### Annotation 1482055355660

 #matlab #programming Probably the most common form of plot is plot(x, y) where x and y are vectors of the same length, e.g., x = 0:pi/40:4*pi; plot(x, sin(x)) In this case, the co-ordinates of the ith point are x i , y i

#### pdf

cannot see any pdfs

#### Annotation 1482056928524

 #matlab #programming Straight-line graphs are drawn by giving the x and y co-ordinates of the end- points in two vectors.

#### pdf

cannot see any pdfs

#### Annotation 1482058501388

 #matlab #programming MATLAB has a set of ‘easy-to-use’ plotting commands, all starting with the string ‘ez’. The easy-to-use form of plot is ezplot, e.g., ezplot(’tan(x)’)

#### pdf

cannot see any pdfs

#### Annotation 1482060074252

 #matlab #programming 9.1 Basic 2-D graphs 199 gtext(’text’) writes a string (’text’) in the graph window. gtext puts a cross-hair in the graph window and waits for a mouse button or keyboard key to be pressed.

#### pdf

cannot see any pdfs

#### Annotation 1482061647116

 #matlab #programming Text may also be placed on a graph interactively with Tools -> Edit Plot from the figure window

#### pdf

cannot see any pdfs

#### Annotation 1482063219980

 #matlab #programming grid adds/removes grid lines to/from the current graph. The grid state may be toggled

#### pdf

cannot see any pdfs

#### Annotation 1482064792844

 #matlab #programming text(x, y, ’text’) writes text in the graphics window at the point speci- fied by x and y. If x and y are vectors, the text is written at each point. If the text is an indexed list, successive points are labeled with corresponding rows of the text

#### pdf

cannot see any pdfs

#### Annotation 1482066365708

 #matlab #programming title(’text’) writes the text as a title on top of the graph

#### pdf

cannot see any pdfs

#### Annotation 1482067938572

 #matlab #programming xlabel(’horizontal’) labels the x-axis

#### pdf

cannot see any pdfs

#### Annotation 1482069511436

 #matlab #programming ylabel(’vertical’) labels the y-axis

#### pdf

cannot see any pdfs

#### Annotation 1482071084300

 #matlab #programming There are at least three ways of drawing multiple plots on the same set of axes (which may however be rescaled if the new data falls outside the range of the previous data). 1. The easiest way is simply to use hold to keep the current plot on the axes. All subsequent plots are added to the axes until hold is released, either with hold off,orjusthold, which toggles the hold state. 2. The second way is to use plot with multiple arguments

#### pdf

cannot see any pdfs

#### Annotation 1482072657164

 #matlab #programming The third way is to use the form plot(x, y) where x and y may both be matrices, or where one may be a vector and one a matrix. If one of x or y is a matrix and the other is a vector, the rows or columns of the matrix are plotted against the vector, using a different color for each.

#### pdf

cannot see any pdfs

#### Annotation 1482074230028

 #matlab #programming If x is not specified, as in plot(y),wherey is a matrix, the columns of y are plotted against the row index

#### pdf

cannot see any pdfs

#### Annotation 1482075802892

 #matlab #programming If x and y are both matrices of the same size, the columns of x are plotted against the columns of y

#### pdf

cannot see any pdfs

#### Annotation 1482077375756

 #matlab #programming plot(x, y, ’--’) joins the plotted points with dashed lines, whereas plot(x, y, ’o’) draws circles at the data points with no lines joining them

#### pdf

cannot see any pdfs

#### Annotation 1482078948620

 #matlab #programming The available colors are denoted by the symbols c, m, y, k, r, g, b, w

#### pdf

cannot see any pdfs

#### Annotation 1482080521484

 #matlab #programming axis( [xmin, xmax, ymin, ymax] ) which sets the scaling on the current plot, i.e., draw the graph first, then reset the axis limits.

#### pdf

cannot see any pdfs

#### Annotation 1482082094348

 #matlab #programming If you want to specify one of the minimum or maximum of a set of axis limits, but want MATLAB to autoscale the other, use Inf or -Inf for the autoscaled limit

#### pdf

cannot see any pdfs

#### Annotation 1482083667212

 #matlab #programming You can return to the default of automatic axis scaling with axis auto

#### pdf

cannot see any pdfs

#### Annotation 1482085240076

 #matlab #programming The statement v = axis returns the current axis scaling in the vector v.

#### pdf

cannot see any pdfs

#### Annotation 1482086812940

 #matlab #programming Scaling is frozen at the current limits with axis manual so that if hold is turned on, subsequent plots will use the same limit

#### pdf

cannot see any pdfs

#### Annotation 1482088385804

 #matlab #programming in MATLAB the word ‘axes’ refers to a particular graphics object, which includes not only the x-axis and y-axis and their tick marks and labels, but also everything drawn on those particular axes: the actual graphs and any text included in the figure

#### pdf

cannot see any pdfs

#### Annotation 1482089958668

 #matlab #programming You can show a number of plots in the same figure window with the subplot function. It looks a little curious at first, but it’s quite easy to get the hang of it. The statement subplot(m, n, p) divides the figure window into m × n small sets of axes, and selects the pth set for the current plot (numbered by row from the left of the top row)

#### pdf

cannot see any pdfs

#### Annotation 1482091531532

 #matlab #programming figure(h),whereh is an integer, creates a new figure window, or makes figure h the current figure

#### pdf

cannot see any pdfs

#### Annotation 1482093104396

 #matlab #programming clf clears the current figure window. It also resets all properties associated with the axes, such as the hold state and the axis state

#### pdf

cannot see any pdfs

#### Annotation 1482094677260

 #matlab #programming cla deletes all plots and text from the current axes, i.e., leaves only the x- and y-axes and their associated information

#### pdf

cannot see any pdfs

#### Annotation 1482097822988

 #matlab #programming The command [x, y] = ginput allows you to select an unlimited number of points from the current graph us- ing a mouse or arrow keys. A movable cross-hair appears on the graph. Clicking saves its co-ordinates in x(i) and y(i). Pressing Enter terminates the input.

#### pdf

cannot see any pdfs

#### Annotation 1482099920140

 #matlab #programming The command [x, y] = ginput(n) works like ginput except that you must select exactly n points

#### pdf

cannot see any pdfs

#### Annotation 1482101493004

 #matlab #programming he command semilogy(x, y) plots y with a log 10 scale and x with a linear scale

#### pdf

cannot see any pdfs

#### Annotation 1482103065868

 #matlab #programming The command polar(theta, r) generates a polar plot of the points with angles in theta and magnitudes in r

#### pdf

cannot see any pdfs

#### Annotation 1482104638732

 #matlab #programming Plotting rapidly changing mathematical functions: fplot

#### pdf

cannot see any pdfs

#### Annotation 1482106211596

 #matlab #programming The function plot3 is the 3-D version of plot. The command plot3(x, y, z) draws a 2-D projection of a line in 3-D through the points whose co-ordinates are the elements of the vectors x, y and z

#### pdf

cannot see any pdfs

#### Annotation 1482107784460

 #matlab #programming The function comet3 is similar to plot3 except that it draws with a moving ‘comet head’.

#### pdf

cannot see any pdfs

#### Annotation 1482109357324

 #matlab #programming The function mesh draws a surface as a ‘wire frame’.

#### pdf

cannot see any pdfs

#### Annotation 1482110930188

 #matlab #programming An alternative visualization is provided by surf, which generates a faceted view of the surface (in color), i.e. the wire frame is covered with small tiles

#### pdf

cannot see any pdfs

#### Annotation 1482112503052

 #matlab #programming contour(u) You should get a contour plot of the heat distribution

#### pdf

cannot see any pdfs

#### Annotation 1482114075916

 #matlab #programming The function contour can take a second input variable. It can be a scalar spec- ifying how many contour levels to plot, or it can be a vector specifying the values at which to plot the contour levels

#### pdf

cannot see any pdfs

#### Annotation 1482115648780

 #matlab #programming You can get a 3-D contour plot with contour3

#### pdf

cannot see any pdfs

#### Annotation 1482117221644

 #matlab #programming Contour levels may be labeled with clabel

#### pdf

cannot see any pdfs

#### Annotation 1482118794508

 #matlab #programming A 3-D contour plot may be drawn under a surface with meshc or surfc

#### pdf

cannot see any pdfs

#### Annotation 1482120367372

 #matlab #programming If a matrix for a surface plot contains NaNs, these elements are not plotted. This enables you to cut away (crop) parts of a surface

#### pdf

cannot see any pdfs

#### Annotation 1482121940236

 #matlab #programming The function quiver draws little arrows to indicate a gradient or other vec- tor field

#### pdf

cannot see any pdfs

#### Annotation 1482123513100

 #matlab #programming The mesh function can also be used to ‘visualize’ a matrix

#### pdf

cannot see any pdfs

#### Annotation 1482125085964

 #matlab #programming The function spy is useful for visualizing sparse matrices

#### pdf

cannot see any pdfs

#### Annotation 1482126658828

 #matlab #programming The view function enables you to specify the angle from which you view a 3- D graph

#### pdf

cannot see any pdfs

#### Annotation 1482128231692

 #matlab #programming The function view takes two arguments. The first one, az in this example, is called the azimuth or polar angle in the x-y plane (in degrees). az rotates the viewpoint (you) about the z-axis—i.e. about the ‘pinnacle’ at (15,15) in Fig- ure 9.12—in a counter-clockwise direction. The default value of az is −37.5 ◦ . The program therefore rotates you in a counter-clockwise direction about the z-axis in 15 ◦ steps starting at the default position. The second argument of view is the vertical elevation el (in degrees). This is the angle a line from the viewpoint makes with the x-y plane. A value of 90 ◦ for el means you are directly overhead. Positive values of the elevation mean you are above the x-y plane; negative values mean you are below it. The default value of el is 30 ◦

#### pdf

cannot see any pdfs

#### Annotation 1482129804556

 #matlab #programming The command pause(n) suspends execution for n seconds.

#### pdf

cannot see any pdfs

#### Annotation 1482131377420

 #matlab #programming You can rotate a 3-D figure interactively as follows. Click the Rotate 3-D button in the figure toolbar (first button from the right). Click on the axes and an outline of the figure appears to help you visualize the rotation. Drag the mouse

#### pdf

cannot see any pdfs

#### Annotation 1482134261004

 #biochem #biology #cell the cell can form a hydrogel that pulls these and other molecules into punctate structures called intracellular bodies, or granules. Specific mRNAs can be seques- tered in such granules, where they are stored until made available by a controlled disassembly of the core amyloid structure that holds them together.

#### pdf

cannot see any pdfs

#### Annotation 1482135833868

 #biochem #biology #cell the FUS protein, an essential nuclear protein with roles in the tran- scription, processing, and transport of specific mRNA molecules.

#### pdf

cannot see any pdfs

#### Annotation 1482137931020

 #biochem #biology #cell Over 80 per- cent of its C-terminal domain of two hundred amino acids is composed of only four amino acids: glycine, serine, glutamine, and tyrosine. This low complexity domain is attached to several other domains that bind to RNA molecules.

#### pdf

cannot see any pdfs

#### Annotation 1482139503884

 #biochem #biology #cell At high enough concentrations in a test tube, FUS protein forms a hydrogel that will associate with either itself or with the low complexity domains from other proteins.