# on 02-Mar-2017 (Thu)

#### Annotation 1482045918476

 #matlab #programming Recursive functions are usually written in this way: an if statement handles the general recursive definition; the else part handles the special case (n = 1)

#### pdf

cannot see any pdfs

#### Annotation 1482047491340

 #matlab #programming where the number of repetitions must be determined in advance, is sometimes called determinate repetition.

#### pdf

cannot see any pdfs

#### Annotation 1482049064204

 #matlab #programming it often happens that the condition to end a loop is only satisfied during the execution of the loop itself . Such a structure is called indeterminate.

#### pdf

cannot see any pdfs

#### Annotation 1482050637068

 #matlab #programming If there are a number of different conditions to stop a while loop you may be tempted to use a for with the number of repetitions set to some accepted cut-off value (or even Inf) but enclosing if statements which break outofthe for when the various conditions are met. Why is this not regarded as the best programming style? The reason is simply that when you read the code months later you will have to wade through the whole loop to find all the conditions to end it, rather than see them all paraded at the start of the loop in the while clause

#### pdf

cannot see any pdfs

#### Annotation 1482052209932

 #matlab #programming Graphs (in 2-D) are drawn with the plot statement. In its simplest form, it takes a single vector argument as in plot(y)

#### pdf

cannot see any pdfs

#### Annotation 1482053782796

 #matlab #programming plot(y). In this case the elements of y are plotted against their indexes, e.g., plot(rand(1, 20)) plots 20 random num- bers against the integers 1–20, joining successive points with straight lines,

#### pdf

cannot see any pdfs

#### Annotation 1482055355660

 #matlab #programming Probably the most common form of plot is plot(x, y) where x and y are vectors of the same length, e.g., x = 0:pi/40:4*pi; plot(x, sin(x)) In this case, the co-ordinates of the ith point are x i , y i

#### pdf

cannot see any pdfs

#### Annotation 1482056928524

 #matlab #programming Straight-line graphs are drawn by giving the x and y co-ordinates of the end- points in two vectors.

#### pdf

cannot see any pdfs

#### Annotation 1482058501388

 #matlab #programming MATLAB has a set of ‘easy-to-use’ plotting commands, all starting with the string ‘ez’. The easy-to-use form of plot is ezplot, e.g., ezplot(’tan(x)’)

#### pdf

cannot see any pdfs

#### Annotation 1482060074252

 #matlab #programming 9.1 Basic 2-D graphs 199 gtext(’text’) writes a string (’text’) in the graph window. gtext puts a cross-hair in the graph window and waits for a mouse button or keyboard key to be pressed.

#### pdf

cannot see any pdfs

#### Annotation 1482061647116

 #matlab #programming Text may also be placed on a graph interactively with Tools -> Edit Plot from the figure window

#### pdf

cannot see any pdfs

#### Annotation 1482063219980

 #matlab #programming grid adds/removes grid lines to/from the current graph. The grid state may be toggled

#### pdf

cannot see any pdfs

#### Annotation 1482064792844

 #matlab #programming text(x, y, ’text’) writes text in the graphics window at the point speci- fied by x and y. If x and y are vectors, the text is written at each point. If the text is an indexed list, successive points are labeled with corresponding rows of the text

#### pdf

cannot see any pdfs

#### Annotation 1482066365708

 #matlab #programming title(’text’) writes the text as a title on top of the graph

#### pdf

cannot see any pdfs

#### Annotation 1482067938572

 #matlab #programming xlabel(’horizontal’) labels the x-axis

#### pdf

cannot see any pdfs

#### Annotation 1482069511436

 #matlab #programming ylabel(’vertical’) labels the y-axis

#### pdf

cannot see any pdfs

#### Annotation 1482071084300

 #matlab #programming There are at least three ways of drawing multiple plots on the same set of axes (which may however be rescaled if the new data falls outside the range of the previous data). 1. The easiest way is simply to use hold to keep the current plot on the axes. All subsequent plots are added to the axes until hold is released, either with hold off,orjusthold, which toggles the hold state. 2. The second way is to use plot with multiple arguments

#### pdf

cannot see any pdfs

#### Annotation 1482072657164

 #matlab #programming The third way is to use the form plot(x, y) where x and y may both be matrices, or where one may be a vector and one a matrix. If one of x or y is a matrix and the other is a vector, the rows or columns of the matrix are plotted against the vector, using a different color for each.

#### pdf

cannot see any pdfs

#### Annotation 1482074230028

 #matlab #programming If x is not specified, as in plot(y),wherey is a matrix, the columns of y are plotted against the row index

#### pdf

cannot see any pdfs

#### Annotation 1482075802892

 #matlab #programming If x and y are both matrices of the same size, the columns of x are plotted against the columns of y

#### pdf

cannot see any pdfs

#### Annotation 1482077375756

 #matlab #programming plot(x, y, ’--’) joins the plotted points with dashed lines, whereas plot(x, y, ’o’) draws circles at the data points with no lines joining them

#### pdf

cannot see any pdfs

#### Annotation 1482078948620

 #matlab #programming The available colors are denoted by the symbols c, m, y, k, r, g, b, w

#### pdf

cannot see any pdfs

#### Annotation 1482080521484

 #matlab #programming axis( [xmin, xmax, ymin, ymax] ) which sets the scaling on the current plot, i.e., draw the graph first, then reset the axis limits.

#### pdf

cannot see any pdfs

#### Annotation 1482082094348

 #matlab #programming If you want to specify one of the minimum or maximum of a set of axis limits, but want MATLAB to autoscale the other, use Inf or -Inf for the autoscaled limit

#### pdf

cannot see any pdfs

#### Annotation 1482083667212

 #matlab #programming You can return to the default of automatic axis scaling with axis auto

#### pdf

cannot see any pdfs

#### Annotation 1482085240076

 #matlab #programming The statement v = axis returns the current axis scaling in the vector v.

#### pdf

cannot see any pdfs

#### Annotation 1482086812940

 #matlab #programming Scaling is frozen at the current limits with axis manual so that if hold is turned on, subsequent plots will use the same limit

#### pdf

cannot see any pdfs

#### Annotation 1482088385804

 #matlab #programming in MATLAB the word ‘axes’ refers to a particular graphics object, which includes not only the x-axis and y-axis and their tick marks and labels, but also everything drawn on those particular axes: the actual graphs and any text included in the figure

#### pdf

cannot see any pdfs

#### Annotation 1482089958668

 #matlab #programming You can show a number of plots in the same figure window with the subplot function. It looks a little curious at first, but it’s quite easy to get the hang of it. The statement subplot(m, n, p) divides the figure window into m × n small sets of axes, and selects the pth set for the current plot (numbered by row from the left of the top row)

#### pdf

cannot see any pdfs

#### Annotation 1482091531532

 #matlab #programming figure(h),whereh is an integer, creates a new figure window, or makes figure h the current figure

#### pdf

cannot see any pdfs

#### Annotation 1482093104396

 #matlab #programming clf clears the current figure window. It also resets all properties associated with the axes, such as the hold state and the axis state

#### pdf

cannot see any pdfs

#### Annotation 1482094677260

 #matlab #programming cla deletes all plots and text from the current axes, i.e., leaves only the x- and y-axes and their associated information

#### pdf

cannot see any pdfs

#### Annotation 1482097822988

 #matlab #programming The command [x, y] = ginput allows you to select an unlimited number of points from the current graph us- ing a mouse or arrow keys. A movable cross-hair appears on the graph. Clicking saves its co-ordinates in x(i) and y(i). Pressing Enter terminates the input.

#### pdf

cannot see any pdfs

#### Annotation 1482099920140

 #matlab #programming The command [x, y] = ginput(n) works like ginput except that you must select exactly n points

#### pdf

cannot see any pdfs

#### Annotation 1482101493004

 #matlab #programming he command semilogy(x, y) plots y with a log 10 scale and x with a linear scale

#### pdf

cannot see any pdfs

#### Annotation 1482103065868

 #matlab #programming The command polar(theta, r) generates a polar plot of the points with angles in theta and magnitudes in r

#### pdf

cannot see any pdfs

#### Annotation 1482104638732

 #matlab #programming Plotting rapidly changing mathematical functions: fplot

#### pdf

cannot see any pdfs

#### Annotation 1482106211596

 #matlab #programming The function plot3 is the 3-D version of plot. The command plot3(x, y, z) draws a 2-D projection of a line in 3-D through the points whose co-ordinates are the elements of the vectors x, y and z

#### pdf

cannot see any pdfs

#### Annotation 1482107784460

 #matlab #programming The function comet3 is similar to plot3 except that it draws with a moving ‘comet head’.

#### pdf

cannot see any pdfs

#### Annotation 1482109357324

 #matlab #programming The function mesh draws a surface as a ‘wire frame’.

#### pdf

cannot see any pdfs

#### Annotation 1482110930188

 #matlab #programming An alternative visualization is provided by surf, which generates a faceted view of the surface (in color), i.e. the wire frame is covered with small tiles

#### pdf

cannot see any pdfs

#### Annotation 1482112503052

 #matlab #programming contour(u) You should get a contour plot of the heat distribution

#### pdf

cannot see any pdfs

#### Annotation 1482114075916

 #matlab #programming The function contour can take a second input variable. It can be a scalar spec- ifying how many contour levels to plot, or it can be a vector specifying the values at which to plot the contour levels

#### pdf

cannot see any pdfs

#### Annotation 1482115648780

 #matlab #programming You can get a 3-D contour plot with contour3

#### pdf

cannot see any pdfs

#### Annotation 1482117221644

 #matlab #programming Contour levels may be labeled with clabel

#### pdf

cannot see any pdfs

#### Annotation 1482118794508

 #matlab #programming A 3-D contour plot may be drawn under a surface with meshc or surfc

#### pdf

cannot see any pdfs

#### Annotation 1482120367372

 #matlab #programming If a matrix for a surface plot contains NaNs, these elements are not plotted. This enables you to cut away (crop) parts of a surface

#### pdf

cannot see any pdfs

#### Annotation 1482121940236

 #matlab #programming The function quiver draws little arrows to indicate a gradient or other vec- tor field

#### pdf

cannot see any pdfs

#### Annotation 1482123513100

 #matlab #programming The mesh function can also be used to ‘visualize’ a matrix

#### pdf

cannot see any pdfs

#### Annotation 1482125085964

 #matlab #programming The function spy is useful for visualizing sparse matrices

#### pdf

cannot see any pdfs

#### Annotation 1482126658828

 #matlab #programming The view function enables you to specify the angle from which you view a 3- D graph

#### pdf

cannot see any pdfs

#### Annotation 1482128231692

 #matlab #programming The function view takes two arguments. The first one, az in this example, is called the azimuth or polar angle in the x-y plane (in degrees). az rotates the viewpoint (you) about the z-axis—i.e. about the ‘pinnacle’ at (15,15) in Fig- ure 9.12—in a counter-clockwise direction. The default value of az is −37.5 ◦ . The program therefore rotates you in a counter-clockwise direction about the z-axis in 15 ◦ steps starting at the default position. The second argument of view is the vertical elevation el (in degrees). This is the angle a line from the viewpoint makes with the x-y plane. A value of 90 ◦ for el means you are directly overhead. Positive values of the elevation mean you are above the x-y plane; negative values mean you are below it. The default value of el is 30 ◦

#### pdf

cannot see any pdfs

#### Annotation 1482129804556

 #matlab #programming The command pause(n) suspends execution for n seconds.

#### pdf

cannot see any pdfs

#### Annotation 1482131377420

 #matlab #programming You can rotate a 3-D figure interactively as follows. Click the Rotate 3-D button in the figure toolbar (first button from the right). Click on the axes and an outline of the figure appears to help you visualize the rotation. Drag the mouse

#### pdf

cannot see any pdfs

#### Annotation 1482134261004

 #biochem #biology #cell the cell can form a hydrogel that pulls these and other molecules into punctate structures called intracellular bodies, or granules. Specific mRNAs can be seques- tered in such granules, where they are stored until made available by a controlled disassembly of the core amyloid structure that holds them together.

#### pdf

cannot see any pdfs

#### Annotation 1482135833868

 #biochem #biology #cell the FUS protein, an essential nuclear protein with roles in the tran- scription, processing, and transport of specific mRNA molecules.

#### pdf

cannot see any pdfs

#### Annotation 1482137931020

 #biochem #biology #cell Over 80 per- cent of its C-terminal domain of two hundred amino acids is composed of only four amino acids: glycine, serine, glutamine, and tyrosine. This low complexity domain is attached to several other domains that bind to RNA molecules.

#### pdf

cannot see any pdfs

#### Annotation 1482139503884

 #biochem #biology #cell At high enough concentrations in a test tube, FUS protein forms a hydrogel that will associate with either itself or with the low complexity domains from other proteins.

#### pdf

cannot see any pdfs

#### Annotation 1482141338892

 #biochem #biology #cell FUS low complexity domain binds most tightly to itself

#### pdf

cannot see any pdfs

#### Annotation 1482142911756

 #biochem #biology #cell both the homotypic and the heterotypic bindings are mediated through a β-sheet core structure forming amyloid fibrils, and that these structures bind to other types of repeat sequences

#### pdf

cannot see any pdfs

#### Annotation 1482144484620

 #biochem #biology #cell Many of these interactions appear to be controlled by the phosphorylation of ser- ine side chains in the one or both of the interacting partners.

#### pdf

cannot see any pdfs

#### Annotation 1482146057484

 #biochem #biology #cell he amyloid fibril is a long unbranched structure assembled through a repeating aggre- gate of β sheets

#### pdf

cannot see any pdfs

#### Annotation 1482147630348

 #biochem #biology #cell The substance that is bound by the protein—whether it is an ion, a small molecule, or a macromolecule such as another protein—is referred to as a ligand for that protein (from the Latin word ligare, meaning “to bind”)

#### pdf

cannot see any pdfs

#### Annotation 1482149203212

 #biochem #biology #cell Even small changes to the amino acids in the interior of a protein molecule can change its three-dimensional shape enough to destroy a binding site on the surface

#### pdf

cannot see any pdfs

#### Annotation 1482151038220

 #biochem #biology #cell the interaction of neighboring parts of the polypeptide chain may restrict the access of water molecules to that protein’s ligand-binding sites. Because water molecules readily form hydrogen bonds that can compete with ligands for sites

#### pdf

cannot see any pdfs

#### Annotation 1482152611084

 #biochem #biology #cell a ligand will form tighter hydrogen bonds (and electro- static interactions) with a protein if water molecules are kept away

#### pdf

cannot see any pdfs

#### Annotation 1482154183948

 #biochem #biology #cell In effect, a protein can keep a ligand-binding site dry, increasing that site's reactivity, because it is energetically unfavorable for individual water mole- cules to break away from this network—as they must do to reach into a crevice on a protein’s surface

#### pdf

cannot see any pdfs

#### Annotation 1482155756812

 #biochem #biology #cell the clustering of neighboring polar amino acid side chains can alter their reactivity. If protein folding forces together a number of negatively charged side chains against their mutual repulsion, for example, the affinity of the site for a positively charged ion is greatly increased.

#### pdf

cannot see any pdfs

#### Annotation 1482157329676

 #biochem #biology #cell when amino acid side chains interact with one another through hydrogen bonds, normally unreactive groups (such as the –CH 2 OH on the serine shown in Figure 3–39) can become reactive, enabling them to be used to make or break selected covalent bonds

#### pdf

cannot see any pdfs

#### Annotation 1482158902540

 #biochem #biology #cell even when the amino acid sequence identity falls to 25%, the backbone atoms in a domain can follow a common protein fold within 0.2 nanometers (2 Å)

#### pdf

cannot see any pdfs

#### Annotation 1482160475404

 #biochem #biology #cell We can use a method called evolutionary tracing to identify those sites in a protein domain that are the most crucial to the domain’s function. Those sites that bind to other molecules are the most likely to be maintained, unchanged as organisms evolve. Thus, in this method, those amino acids that are unchanged, or nearly unchanged, in all of the known protein family members are mapped onto a model of the three-dimensional structure of one family member. When this is done, the most invariant positions often form one or more clusters on the protein surface, as illustrated in Figure 3–40A for the SH2 domain described previously (see Figure 3–6). These clusters generally correspond to ligand-binding sites.

#### pdf

cannot see any pdfs

#### Annotation 1482162048268

 #biochem #biology #cell The amino acids located at the binding site for the phosphorylated polypeptide have been the slowest to change during the long evolutionary process

#### pdf

cannot see any pdfs

#### Annotation 1482163621132

 #biochem #biology #cell n many cases, a portion of the surface of one protein contacts an extended loop of polypeptide chain (a “string”) on a second protein

#### pdf

cannot see any pdfs

#### Annotation 1482165456140

 #biochem #biology #cell A second type of protein–protein interface forms when two α helices, one from each protein, pair together to form a coiled-coil

#### pdf

cannot see any pdfs

#### Annotation 1482167029004

 #biochem #biology #cell The most common way for proteins to interact, however, is by the precise matching of one rigid surface with that of another

#### pdf

cannot see any pdfs

#### Annotation 1482168601868

 #biochem #biology #cell Different antibod- ies generate an enormous diversity of antigen-binding sites by changing only the length and amino acid sequence of these loops, without altering the basic protein structure.

#### pdf

cannot see any pdfs

#### Annotation 1482170174732

 #biochem #biology #cell A detailed examination of the antigen-binding sites of antibodies reveals that they are formed from several loops of polypeptide chain that protrude from the ends of a pair of closely juxtaposed protein domains

#### pdf

cannot see any pdfs

#### Annotation 1482171747596

 #biochem #biology #cell Antibodies are Y-shaped molecules with two identical binding sites that are complementary to a small portion of the surface of the antigen molecule

#### pdf

cannot see any pdfs

#### Annotation 1482172271884

 #biochem #biology #cell Loops of this kind are ideal for grasping other molecules. They allow a large number of chemical groups to surround a ligand so that the protein can link to it with many weak bonds. For this reason, loops often form the ligand-binding sites in proteins

#### pdf

cannot see any pdfs

#### Annotation 1482174893324

 #biochem #biology #cell Strong interactions occur in cells when- ever a biological function requires that molecules remain associated for a long time—for example, when a group of RNA and protein molecules come together to make a subcellular structure such as a ribosome

#### pdf

cannot see any pdfs

#### Annotation 1482191408396

 #deeplearning #neuralnetworks Self-information deals only with a single outcome. W e can quantify the amoun t of uncertain t y in an en tire probabilit y distribution using the Shannon entrop y : H ( ) = x E x ∼ P [ ( )] = I x − E x ∼ P [log ( )] P x . (3.49) also denoted H ( P )

#### pdf

cannot see any pdfs

#### Annotation 1482192981260

 #deeplearning #neuralnetworks the Shannon en trop y of a distribution is the exp ected amoun t of information in an ev en t dra wn from that distribution

#### pdf

cannot see any pdfs

#### Annotation 1482194554124

 #deeplearning #neuralnetworks It gives a lo w er bound on the num b er of bits (if the logarithm is base 2, otherwise the units are diﬀeren t) needed on av erage to enco de symbols drawn from a distribution P

#### pdf

cannot see any pdfs

#### Annotation 1482196126988

 #deeplearning #neuralnetworks Distributions that are nearly deterministic (where the outcome is nearly certain) ha v e lo w en trop y; distributions that are closer to uniform hav e high entrop y

#### pdf

cannot see any pdfs

#### Annotation 1482197699852

 #deeplearning #neuralnetworks When 3.5 x is con tin uous, the Shannon entrop y is kno wn as the diﬀeren tial entrop y

#### pdf

cannot see any pdfs

#### Annotation 1482199272716

 #deeplearning #neuralnetworks If we hav e t w o separate probability distributions P ( x ) and Q ( x ) ov er the same random v ariable x , we can measure ho w diﬀerent these t w o distributions are using the Kullbac k-Leibler (KL) div ergence : D KL ( ) = P Q E x ∼ P log P x ( ) Q x ( ) = E x ∼ P [log ( ) log ( )] P x − Q x

#### pdf

cannot see any pdfs

#### Annotation 1482200845580

 #deeplearning #neuralnetworks Kullbac k-Leibler (KL) div ergence : $$D_{KL}( P||Q) = E_{x\sim P} = [log\frac{ P (x )}{ Q (x )}] = E_{x \sim P} [log P(x) - log Q(x)]$$ (3.50) In the case of discrete v ariables, it is the extra amount of information (measured in bits if we use the base 2 logarithm, (but in machine learning w e usually use nats and the natural logarithm) needed to send a message containing symbols drawn from probability distribution P , when w e use a co de that w as designed to minimize the length of messages dra wn from probabilit y distribution Q.

#### pdf

cannot see any pdfs

#### Annotation 1482202418444

 #deeplearning #neuralnetworks The KL div ergence has many useful prop erties, most notably that it is non- negativ e.

#### pdf

cannot see any pdfs

#### Annotation 1482203991308

 #deeplearning #neuralnetworks The KL divergence is 0 if and only if P and Q are the same distribution in the case of discrete v ariables, or equal “almost ev erywhere” in the case of con tin uous v ariables

#### pdf

cannot see any pdfs

#### Annotation 1482205564172

 #deeplearning #neuralnetworks KL div ergence is non-negativ e and measures the diﬀerence b et w een t w o distributions, it is often conceptualized as measuring some sort of distance b etw een these distributions. How ever, it is not a true distance measure b ecause it is not symmetric

#### pdf

cannot see any pdfs

#### Annotation 1482207137036

 #deeplearning #neuralnetworks A quan tit y that is closely related to the KL div ergence is the cross-en trop y $$H ( P , Q ) = H ( P ) + D_{KL}( P|| Q )$$ , whic h is similar to the KL div ergence but lac king the term on the left: $$H (P, Q ) = − E _{x \sim P} \space log Q(x)$$

#### pdf

cannot see any pdfs

#### Annotation 1482208709900

 #deeplearning #neuralnetworks Minimizing the cross-entrop y with resp ect to Q is equiv alent to minimizing the KL div ergence, b ecause do es not participate in the omitted term. Q

#### pdf

cannot see any pdfs

#### Annotation 1482210282764

 #deeplearning #neuralnetworks the form 0 log 0 . By con v en tion, in the con text of information theory , w e treat these expressions as lim x → 0 x x log = 0

#### pdf

cannot see any pdfs

#### Annotation 1482211855628

 #deeplearning #neuralnetworks we can greatly reduce the cost of representing a distribution if w e are able to ﬁnd a factorization in to distributions o v er few er v ariables

#### pdf

cannot see any pdfs

#### Annotation 1482213428492

 #deeplearning #neuralnetworks When we represen t the factorization of a probabilit y distribution with a graph, we call it a structured probabilistic mo del or graphical mo del

#### pdf

cannot see any pdfs

#### Annotation 1482215001356

 #deeplearning #neuralnetworks There are t w o main kinds of structured probabilistic mo dels: directed and undirected.

#### pdf

cannot see any pdfs

#### Annotation 1482216574220

 #deeplearning #neuralnetworks graphical models use a graph G in which each no de in the graph corresp onds to a random v ariable, and an edge connecting tw o random v ariables means that the probability distribution is able to represen t direct in teractions b et w een those t w o random v ariables

#### pdf

cannot see any pdfs

#### Annotation 1482218147084

 #deeplearning #neuralnetworks Directed mo dels use graphs with directed edges, and they represen t fac- torizations into conditional probability distributions

#### pdf

cannot see any pdfs

#### Annotation 1482219719948

 #deeplearning #neuralnetworks a directed mo del contains one factor for ev ery random v ariable x i in the distribution, and that factor consists of the conditional distribution ov er x i giv en the paren ts of x i , denoted P a G ( x i ) : p ( ) = x i p ( x i | P a G ( x i ))

#### pdf

cannot see any pdfs

#### Annotation 1482221292812

 #deeplearning #neuralnetworks Undirected mo dels use graphs with undirected edges, and they represen t factorizations into a set of functions

#### pdf

cannot see any pdfs

#### Annotation 1482222865676

 #deeplearning #neuralnetworks Any set of no des that are all connected to each other in G is called a clique.

#### pdf

cannot see any pdfs

#### Annotation 1482224438540

 #deeplearning #neuralnetworks Each clique C ( ) i in an undirected mo del is asso ciated with a factor φ ( ) i ( C ( ) i )

#### pdf

cannot see any pdfs

#### Annotation 1482226011404

 #deeplearning #neuralnetworks Each clique C ( ) i in an undirected mo del is asso ciated with a factor φ ( ) i ( C ( ) i ) . These factors are just functions, not probabilit y distributions

#### pdf

cannot see any pdfs

#### Annotation 1482227584268

 #deeplearning #neuralnetworks The probability of a conﬁguration of random v ariables is prop ortional to the pro duct of all of these factors—assignments that result in larger factor v alues are more lik ely

#### pdf

cannot see any pdfs

#### Annotation 1482229157132

 #deeplearning #neuralnetworks W e therefore divide by a normalizing constant Z , deﬁned to b e the sum or integral o v er all states of the pro duct of the φ functions, in order to obtain a normalized probabilit y distribution: p ( ) = x 1 Z i φ ( ) i C ( )

#### pdf

cannot see any pdfs

#### Annotation 1482230729996

 #deeplearning #neuralnetworks numerical compu- tation. This t ypically refers to algorithms that solve mathematical problems b y metho ds that update estimates of the solution via an iterative pro cess, rather than analytically deriving a form ula providing a symbolic expression for the correct so- lution.

#### pdf

cannot see any pdfs

#### Annotation 1482232302860

 #deeplearning #neuralnetworks optimization (ﬁnding the v alue of an argument that minimizes or maximizes a function

#### pdf

cannot see any pdfs

#### Annotation 1482233875724

 #deeplearning #neuralnetworks The fundamen tal diﬃculty in p erforming contin uous math on a digital computer is that w e need to represent inﬁnitely many real num b ers with a ﬁnite num ber of bit patterns

#### pdf

cannot see any pdfs

#### Annotation 1482235448588

 #deeplearning #neuralnetworks Underﬂo w o ccurs when n um b ers near zero are rounded to zero

#### pdf

cannot see any pdfs

#### Annotation 1482237021452

 #deeplearning #neuralnetworks Overﬂo w o ccurs when n um bers with large magnitude are appro ximated as ∞ or −∞

#### pdf

cannot see any pdfs

#### Annotation 1482238594316

 #deeplearning #neuralnetworks The softmax function is often used to predict the probabilities asso ciated with a multinoulli distribution

#### pdf

cannot see any pdfs

#### Annotation 1482240167180

 #deeplearning #neuralnetworks The softmax function is deﬁned to b e softmax( ) x i = exp( x i ) n j =1 exp( x j )

#### pdf

cannot see any pdfs

#### Annotation 1482241740044

 #deeplearning #neuralnetworks Both of these diﬃculties can b e resolved by instead ev aluating softmax ( z ) where z = x − maxi xi . Simple algebra shows that the v alue of the softmax function is not c hanged analytically by adding or subtracting a scalar from the input vector. Subtracting max i x i results in the largest argument to exp b eing 0, whic h rules out the p ossibility of ov erﬂo w. Lik ewise, at least one term in the denominator has a v alue of 1, which rules out the p ossibility of underﬂow in the denominator leading to a division by zero

#### pdf

cannot see any pdfs

#### Annotation 1482243312908

 #deeplearning #neuralnetworks Theano ( , ; , ) is an example Bergstra et al. 2010 Bastien et al. 2012 of a softw are pack age that automatically detects and stabilizes man y common n umerically unstable expressions that arise in the context of deep learning

#### pdf

cannot see any pdfs

#### Annotation 1482244885772

 #deeplearning #neuralnetworks Conditioning refers to how rapidly a function c hanges with resp ect to small changes in its inputs

#### pdf

cannot see any pdfs

#### Annotation 1482246458636

 #deeplearning #neuralnetworks Consider the function f ( x ) = A − 1 x . When A ∈ R n n × has an eigenv alue decomp osition, its condition num ber is max i,j λ i λ j . (4.2) This is the ratio of the magnitude of the largest and smallest eigen v alue

#### pdf

cannot see any pdfs

#### Annotation 1482248031500

 #deeplearning #neuralnetworks Optimization refers to the task of either minimizing or maximizing some function f ( x ) b y altering x .

#### pdf

cannot see any pdfs

#### Annotation 1482249604364

 #deeplearning #neuralnetworks W e usually phrase most optimization problems in terms of minimizing f ( x ) . Maximization ma y b e accomplished via a minimization algorithm by minimizing − f (x)

#### pdf

cannot see any pdfs

#### Annotation 1482251177228

 #deeplearning #neuralnetworks The function we wan t to minimize or maximize is called the ob jectiv e func- tion or criterion . When we are minimizing it, w e may also call it the cost function , loss function , or error function .

#### pdf

cannot see any pdfs

#### Annotation 1482252750092

 #deeplearning #neuralnetworks W e often denote the v alue that minimizes or maximizes a function with a sup erscript . F or example, w e might say ∗

#### pdf

cannot see any pdfs

#### Annotation 1482254322956

 #deeplearning #neuralnetworks The deriv ativ e is therefore useful for minimizing a function b ecause it tells us how to change x in order to mak e a small improv emen t in y . F or example, w e know that f ( x − sign ( f ( x ))) is less than f ( x ) for small enough . W e can th us reduce f ( x ) b y moving x in small steps with opp osite sign of the deriv ativ e. This tec hnique is called gradien t descen t

#### pdf

cannot see any pdfs

#### Annotation 1482255895820

 #deeplearning #neuralnetworks When f ( x ) = 0 , the deriv ative provides no information ab out which direction to mov e. Poin ts where f ( x ) = 0 are known as critical p oints or stationary p oin ts .

#### pdf

cannot see any pdfs

#### Annotation 1482257468684

 #deeplearning #neuralnetworks A lo cal minim um is a p oint where f ( x ) is low er than at all neighboring p oin ts, so it is no longer p ossible to decrease f ( x ) b y making inﬁnitesimal steps

#### pdf

cannot see any pdfs

#### Annotation 1482259041548

 #deeplearning #neuralnetworks A lo cal maximum is a p oint where f ( x ) is higher than at all neighboring p oin ts

#### pdf

cannot see any pdfs

#### Annotation 1482260614412

 #deeplearning #neuralnetworks Some critical p oin ts are neither maxima nor minima. These are kno wn as saddle p oints

#### pdf

cannot see any pdfs

#### Annotation 1482262973708

 #bayes #programming #r #statistics The concept of representing a distribution by a large representative sample is foundational for the approach we take to Bayesian analysis of complex models

#### pdf

cannot see any pdfs

#### Annotation 1482264546572

 #bayes #programming #r #statistics What is new in the present application is that the population from which we are sampling is a mathematically defined distribution, such as a posterior probability distributio

#### pdf

cannot see any pdfs

#### Annotation 1482266119436

 #bayes #programming #r #statistics Here is a summary of our algorithm for moving from one position to another. We are currently at position θ current . We then propose to move one position right or one position left. The specific proposal is determined by flipping a coin, which can result in 50% heads (move right) or 50% tails (move left). The range of possible proposed moves, and the probability of proposing each, is called the proposal distribution. In the present algorithm, the proposal distribution is very simple: It has only two values with 50-50 probabilities. Having proposed a move, we then decide whether or not to accept it. The acceptance decision is based on the value of the target distribution at the proposed position, relative to the value of the target distribution at our current position. Specifically, if the target distribution is greater at the proposed position than at our current position, then we definitely accept the proposed move: We always move higher if we can. On the other hand, if the target position is less at the proposed position than at our current position, we accept the move probabilistically: We move to the proposed position with probability p move =P(θ proposed )/P(θ current ),whereP(θ) is the value of the target distribution at θ .We can combine these two possibilities, of the target distribution being higher or lower at the proposed position than at our current position, into a single expression for the probability of moving to the proposed position: p move = min P(θ proposed ) P(θ current ) ,1 (7.1) Notice that Equation 7.1 says that when P(θ proposed )>P(θ current ),thenp move =1. Notice also that the target distribution, P(θ), does not need to be normalized, which means it does not need to sum to 1 as a probability distribution must. This is because what matters for our choice is the ratio, P(θ proposed )/P(θ current ), not the absolute magnitude of P(θ)

#### pdf

cannot see any pdfs

#### Annotation 1482267692300

 #bayes #programming #r #statistics Notice what we must be able to do in the random-walk process: • We must be able to generate a random value from the proposal distribution, to create θ proposed . • We must be able to evaluate the target distribution at any proposed position, to compute P(θ proposed )/P(θ current ). • We must be able to generate a random value from a uniform distribution, to accept or reject the proposal according to p move

#### pdf

cannot see any pdfs

#### Annotation 1482269265164

 #bayes #programming #r #statistics Suppose we are at position θ. The probability of moving to θ + 1, denoted p(θ → θ + 1), is the probability of proposing that move times the probability of accepting it if proposed, which is p(θ → θ + 1) =0.5 · min ( P(θ + 1)/P(θ ),1 )

#### pdf

cannot see any pdfs

#### Annotation 1482270838028

 #bayes #programming #r #statistics On the other hand, if we are presently at position θ + 1, the probability of moving to θ is the probability of proposing that move times the probability of accepting it if proposed, which is p(θ + 1 → θ)=0.5 · min ( P(θ)/P(θ + 1),1 ) .

#### pdf

cannot see any pdfs

#### Annotation 1482272410892

 #bayes #programming #r #statistics The ratio of the transition probabilities is $$\frac{p(θ → θ +1)}{ p(θ +1 → θ)} = \frac{0.5 min ( P(θ +1)/P(θ),1 )}{ 0.5 min ( P(θ)/P(θ +1),1 )}$$ = $$\frac {1} {P(\theta)/P(\theta+1)}$$ (if P(\theta+1) > P(\theta)) $$\frac {P(\theta)/P(\theta+1)} {1}$$ (if P(\theta+1) < P(\theta)) = $$\frac{P(\theta+1)} {P(\theta)}$$

#### pdf

cannot see any pdfs

#### Annotation 1482273983756

 #bayes #programming #r #statistics When the vector of position probabilities is the target distribution, it stays that way on the next time step! In other words, the position probabilities are stable at the target distribution

#### pdf

cannot see any pdfs

#### Annotation 1482275556620

 #bayes #programming #r #statistics Sample values from the target distribution are generated by taking a random walk through the parameter space. The walk starts at some arbitrary point, specified by the user. The starting point should be someplace where P(θ ) is nonzero. The random walk progresses at each time step by proposing a move to a new position in parameter space and then deciding whether or not to accept the proposed move. Proposal distributions can take on many different forms, with the goal being to use a proposal distribution that efficiently explores the regions of the parameter space where P(θ ) has most of its mass.

#### pdf

cannot see any pdfs

#### Annotation 1482277129484

 #bayes #programming #r #statistics Having generated a proposed new position, the algorithm then decides whether or not to accept the proposal. The decision rule is exactly what was already specified in Equation 7.1. In detail, this is accomplished by computing the ratio p move =P(θ proposed )/P(θ current ).

#### pdf

cannot see any pdfs

#### Annotation 1482278702348

 #matlab #programming Here’s how to get handles: The functions that draw graphics objects can also be used to return the handle of the object drawn, e.g., x = 0:pi/20:2*pi; hsin = plot(x, sin(x)) hold on hx = xlabel(’x’)

#### pdf

cannot see any pdfs

#### Annotation 1482280275212

 #matlab #programming gcf gets the handle of the current figure, e.g., hf = gcf

#### pdf

cannot see any pdfs

#### Annotation 1482281848076

 #matlab #programming gca gets the handle of the current axes

#### pdf

cannot see any pdfs

#### Annotation 1482283420940

 #matlab #programming gco gets the handle of the current graphics object, which is the last graph- ics object created or clicked on. For example, draw the sine graph above and get its handle hsin. Click on the graph in the figure win- dow. Then enter the command ho = gco

#### pdf

cannot see any pdfs

#### Annotation 1482284993804

 #matlab #programming Handle Graphics objects are the basic elements used in MATLAB graphics. The objects are arranged in a parent-child inheritance structure

#### pdf

cannot see any pdfs

#### Annotation 1482286566668

 #matlab #programming To see all the property names of an object and their current values use get(h) where h is the object’s handle

#### pdf

cannot see any pdfs

#### Annotation 1482288139532

 #matlab #programming You can change any property value with the set function: set(handle, ‘PropertyName’, PropertyValue)

#### pdf

cannot see any pdfs

#### Annotation 1482289712396

 #matlab #programming The command set(handle) lists all the possible property values (where appro- priate)

#### pdf

cannot see any pdfs

#### Annotation 1482291285260

 #matlab #programming If a graphics object has a number of children the get command used with the children property returns a vector of the children’s handles

#### pdf

cannot see any pdfs

#### Annotation 1482292858124

 #matlab #programming The answer is that the handles of children of the axes are returned in the reverse order in which they are created

#### pdf

cannot see any pdfs

#### Annotation 1482294430988

 #matlab #programming If you are desperate and don’t know the handles of any of your graphics ob- jects you can use the findobj function to get the handle of an object with a property value that uniquely identifies it

#### pdf

cannot see any pdfs

#### Annotation 1482296003852

 #matlab #programming you can specify the parent of an object when you create it