# on 06-Jan-2018 (Sat)

#### Flashcard 1621391707404

Tags
#tvm
Question
The [...] compensates investors for the increased sensitivity of the market value of debt to a change in market interest rates.

status measured difficulty not learned 37% [default] 0

#### Flashcard 1622690106636

Tags
#tvm
Question
Shortcut factor PV even cashflows (Annuity)

PV = [...]
$$PV = A ({1-\frac1{(1+r)^n} \ {} \over r})$$

status measured difficulty not learned 37% [default] 0

#### Flashcard 1622926036236

Tags
#discounted-cashflow-applications
Question
What is the NPV Rule rule

An investment should be undertaken if its NPV is positive but not undertaken if its NPV is negative.

status measured difficulty not learned 37% [default] 0

#### Flashcard 1632002510092

Tags
Question
When calculating the Time Weighted rate of return if the measurement period < 1 year, [...] to get an annualized rate of return for the year.
compound holding period returns

status measured difficulty not learned 37% [default] 0

Subject 3. Dollar-weighted and Time-weighted Rates of Return
sub-period: HPR = (Dividends + Ending Price)/Beginning Price - 1. For the first year, HPR 1 : (150 + 10)/100 - 1 = 0.60. For the second year, HPR 2 : (280 + 20)/300 - 1 = 0. Calculate the time-weighted rate of return: <span>If the measurement period < 1 year, compound holding period returns to get an annualized rate of return for the year. If the measurement period > 1 year, take the geometric mean of the annual returns. <span><body><html>

#### Flashcard 1633798982924

Question
A zero-coupon bond is a debt security that doesn't [...] but is traded at a [...]
pay interest (a coupon)

deep discount.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
A zero-coupon bond is a debt security that doesn't pay interest (a coupon) but is traded at a deep discount, rendering profit at maturity when the bond is redeemed for its full face value.

#### Original toplevel document

Zero-Coupon Bond
What is a 'Zero-Coupon Bond' <span>A zero-coupon bond, also known as an "accrual bond," is a debt security that doesn't pay interest (a coupon) but is traded at a deep discount, rendering profit at maturity when the bond is redeemed for its full face value. Some zero-coupon bonds are issued as such, while others are bonds that have been stripped of their coupons by a financial institution and then repackaged as zero-coupon bonds. Because t

#### Flashcard 1633877101836

Tags
Question
A zero-coupon bond with a maturity of five years will mature in [...] periods.
10 6-month

status measured difficulty not learned 37% [default] 0

Subject 5. Bond Equivalent Yield
Periodic bond yields for both straight and zero-coupon bonds are conventionally computed based on semi-annual periods, as U.S. bonds typically make two coupon payments per year. For example, a zero-coupon bond with a maturity of five years will mature in 10 6-month periods. The periodic yield for that bond, r, is indicated by the equation Price = Maturity value x (1 + r) -10 . This yield is an internal rate of return with semi-annual compounding. How do we

#### Flashcard 1634542161164

Tags
#baii
Question
Como reseteo a las funciones de fabrica?
2nd -----> Reset------->Enter

status measured difficulty not learned 37% [default] 0

#### Flashcard 1635099741452

Tags
#statistical-concepts-and-market-returns
Question
all members of a specified group.

status measured difficulty not learned 37% [default] 0

#### Flashcard 1635107081484

Tags
#statistical-concepts-and-market-returns
Question
• A [...] is a quantity computed from or used to describe a sample.

status measured difficulty not learned 37% [default] 0

#### Flashcard 1635136179468

Tags
#statistical-concepts-and-market-returns
Question

The [...] is the absolute frequency of each interval divided by the total number of observations.

status measured difficulty not learned 37% [default] 0

#### Flashcard 1635224521996

Tags
#baii
Question
Que signo debe de tener el PV?
negativo

status measured difficulty not learned 37% [default] 0

#### Flashcard 1635444722956

Tags
Question
Two financial examples of ratio scales are [...] and [...]
rates of return and money.

status measured difficulty not learned 37% [default] 0

Subject 2. Measurement Scales
ing and addition or subtraction, ratio scales allow computation of meaningful ratios. A good example is the Kelvin scale of temperature. This scale has an absolute zero. Thus, a temperature of 300°K is twice as high as a temperature of 150°K. <span>Two financial examples of ratio scales are rates of return and money. Both examples can be measured on a zero scale, where zero represents no return, or in the case of money, no money. Note that as you move down through this list, the measur

#### Flashcard 1636650061068

Tags
Question
The mean, median, and mode are equal in [...].
symmetric distributions

status measured difficulty not learned 37% [default] 0

Subject 4. Measures of Center Tendency
of n = 2 and n = 3 are given by: and so on. For n = 2, the harmonic mean is related to arithmetic mean A and geometric mean G by: <span>The mean, median, and mode are equal in symmetric distributions. The mean is higher than the median in positively skewed distributions and lower than the median in negatively skewed distributions. Extreme values affect the value of the mean, while th

#### Flashcard 1636864232716

Tags
Question
A possible value of a random variable is called an [...]
outcome

status measured difficulty not learned 37% [default] 0

#### Flashcard 1637115890956

Tags
Question

The probability of an event estimated as a relative frequency of occurrence is called an [...]

Empirical probability

status measured difficulty not learned 37% [default] 0

#### Flashcard 1641175190796

Tags
Question
The class mark is also called [...]
Midvalue or central value

status measured difficulty not learned 37% [default] 0

#### Flashcard 1644907072780

Tags
Question
Odds against E = [...]
[1 − P (E) ] / P (E)

status measured difficulty not learned 37% [default] 0

#### Flashcard 1645856558348

Tags
#investopedia
Question
[...] is an investment technique of buying a fixed dollar amount of a particular investment on a regular schedule, regardless of the share price.
Dollar-cost averaging (DCA)

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Dollar-cost averaging (DCA) is an investment technique of buying a fixed dollar amount of a particular investment on a regular schedule, regardless of the share price. The investor purchases more shares when prices are low and fewer shares when prices are high. The premise is that DCA lowers the average share cost over time, increasing the opportunity

#### Original toplevel document

Dollar-Cost Averaging (DCA)
What is 'Dollar-Cost Averaging - DCA' <span>Dollar-cost averaging (DCA) is an investment technique of buying a fixed dollar amount of a particular investment on a regular schedule, regardless of the share price. The investor purchases more shares when prices are low and fewer shares when prices are high. The premise is that DCA lowers the average share cost over time, increasing the opportunity to profit. The DCA technique does not guarantee that an investor won't lose money on investments. Rather, it is meant to allow investment over time instead of investment as a lump sum. BREAKING DOWN 'Dollar-Cost Averaging - DCA' Fundamental to the strategy is a commitment to investing a fixed dollar amount each month. Depending

#### Flashcard 1646434585868

Tags
Question
For a random variable X, the expected value of X is denoted [...]
E(X).

status measured difficulty not learned 37% [default] 0
Subject 6. Expected Value, Variance, and Standard Deviation of a Random Variable
ed with probability, the expected value simply factors in the relative chances of each event occurring, in order to determine the overall result. The more probable outcomes will have a greater weighting in the overall calculation. <span>For a random variable X, the expected value of X is denoted E(X). E(X) = P(x 1 ) x 1 + P(x 2 ) x 2 + ... + P(x n ) x n In investment analysis, forecasts are frequently made using expected value, for example,

#### Flashcard 1646645611788

Question
Ex-ante, derived from the Latin for " [...] "
before the event

status measured difficulty not learned 37% [default] 0
Ex-Ante
What is 'Ex-Ante' <span>Ex-ante, derived from the Latin for "before the event," is a term that refers to future events, such as future returns or prospects of a company. Ex-ante analysis helps to give an idea of future movements in price or the future impact of a n

#### Flashcard 1646689914124

Tags
Question
Expected values make more sense when viewed in the long run or the short run?
over the long run.

status measured difficulty not learned 37% [default] 0

Open it

#### Flashcard 1648858631436

Tags
Question
The expected return on a portfolio of assets is the [...] of the [...] on the [...]
market-weighted average

expected returns

individual assets in the portfolio.

status measured difficulty not learned 37% [default] 0
Subject 8. Portfolio Expected Return and Variance
The expected return on a portfolio of assets is the market-weighted average of the expected returns on the individual assets in the portfolio. The variance of a portfolio's return consists of two components: the weighted average of the variance for individual assets and the weighted covariance between pairs of individual asset

#### Flashcard 1648872787212

Tags
Question
Waht are the boundaries of covariance?
none. $$(-\infty) to (+\infty)$$

status measured difficulty not learned 37% [default] 0

#### Flashcard 1648875146508

Tags
Question
If you are measuring the height of people in cm and calculate the covariance of two observations, what would be the units of the covariance?
Percentual centimeters (%cm)

Which is weird as fuck

status measured difficulty not learned 37% [default] 0

#### Flashcard 1648910273804

Tags
Question
The assumption of equal prior probabilities.
Diffuse prior

status measured difficulty not learned 37% [default] 0

#### Flashcard 1652123897100

Tags
Question
A [...] is a listing in which the order of listing does not matter.
combination

status measured difficulty not learned 37% [default] 0
Subject 10. Principles of Counting
nlike the multiplication rule, factorial involves only a single group. It involves arranging items within a group, and the order of the arrangement does matter. The arrangement of ABCDE is different from the arrangement of ACBDE. <span>A combination is a listing in which the order of listing does not matter. This describes the number of ways that we can choose r objects from a total of n objects, where the order in which the r objects is listed does not matter (The combination formula, or t

#### Flashcard 1652318407948

Tags
Question
Why can there never be more combinations than permutations for the same problem?

because permutations take into account all possible orderings of items, whereas combinations do not.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Regarding counting, there can never be more combinations than permutations for the same problem, because permutations take into account all possible orderings of items, whereas combinations do not.

#### Original toplevel document

Subject 10. Principles of Counting
he ten stocks you are analyzing and invest $10,000 in one stock and$20,000 in another stock, how many ways can you select the stocks? Note that the order of your selection is important in this case. 10 P 2 = 10!/(10 - 2)! = 90 <span>Note that there can never be more combinations than permutations for the same problem, because permutations take into account all possible orderings of items, whereas combinations do not. <span><body><html>

#### Flashcard 1652320242956

Tags
Question
Do combinations take into account all possible orderings of items?
No

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Regarding counting, there can never be more combinations than permutations for the same problem, because permutations take into account all possible orderings of items, whereas combinations do not.

#### Original toplevel document

Subject 10. Principles of Counting
he ten stocks you are analyzing and invest $10,000 in one stock and$20,000 in another stock, how many ways can you select the stocks? Note that the order of your selection is important in this case. 10 P 2 = 10!/(10 - 2)! = 90 <span>Note that there can never be more combinations than permutations for the same problem, because permutations take into account all possible orderings of items, whereas combinations do not. <span><body><html>

#### Flashcard 1652323912972

Tags
#probability
Question
[...] states that for a normal distribution, nearly all of the data will fall within three standard deviations of the mean.
The empirical rule

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The empirical rule states that for a normal distribution, nearly all of the data will fall within three standard deviations of the mean. The empirical rule can be broken down into three parts: 68% of data

#### Original toplevel document

rtment of Statistics Online Learning! - Penn State","th":83,"tu":"https://encrypted-tbn0.gstatic.com/images?q\u003dtbn:ANd9GcQuElAJ2v_EaT3kTk6OttMFj8vC8cwGQrbEbwExrMxvAB-IY7aQ1Nkvdoo","tw":210} <span>The empirical rule states that for a normal distribution, nearly all of the data will fall within three standard deviations of the mean. The empirical rule can be broken down into three parts: 68% of data falls within the first standard deviation from the mean. 95% fall within two standard deviations.Nov 1, 2013 Empirical Rule: What is it? - Statistics How To www.statisticshowto.com/empirical-rule-2/ Feedback About this result People also ask What is the empirical rul

#### Flashcard 1652329155852

Question
[...] comparisons of quarterly EPS are with the immediate prior quarter.
Sequential

status measured difficulty not learned 37% [default] 0

Open it
Sequential comparisons of quarterly EPS are with the immediate prior quarter. A sequential comparison stands in contrast to a comparison with the same quarter one year ago (another frequent type o

#### Flashcard 1652335971596

Tags
Question

General Formula for Labeling Problems

Multinomial Formula

status measured difficulty not learned 37% [default] 0

Open it
Multinomial Formula (General Formula for Labeling Problems). The number of ways that n objects can be labeled with kdifferent labels, with n 1 of the first type, n 2 of the second type, and so on, with n

#### Flashcard 1652344622348

Tags
Question
The combination formula, or the binomial formula): status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
A combination is a listing in which the order of listing does not matter. This describes the number of ways that we can choose r objects from a total of n objects, where the order in which the r objects is listed does not matter (The combination formula, or the binomial formula):

#### Original toplevel document

Subject 10. Principles of Counting
nlike the multiplication rule, factorial involves only a single group. It involves arranging items within a group, and the order of the arrangement does matter. The arrangement of ABCDE is different from the arrangement of ACBDE. <span>A combination is a listing in which the order of listing does not matter. This describes the number of ways that we can choose r objects from a total of n objects, where the order in which the r objects is listed does not matter (The combination formula, or the binomial formula): For example, if you select two of the ten stocks you are analyzing, how many ways can you select the stocks? 10! / [(10 - 2)! x 2!] = 45. &

#### Flashcard 1652379487500

Tags
Question
A listing in which the order of the listed items does not matter.
Combination

status measured difficulty not learned 37% [default] 0

Multinominal formula
A mutual fund guide ranked 18 bond mutual funds by total returns for the year 2014. The guide also assigned each fund one of five risk labels: high risk (four funds), above-average risk (

#### Flashcard 1652389448972

Tags
Question
Do I want to assign every member of a group of size n to one of n slots (or tasks)? If the answer is yes, use [...]
n factorial.

status measured difficulty not learned 37% [default] 0

Counting
ossible outcomes? If the answer is yes, you may be able to use a tool in this section, and you can go to the second question. If the answer is no, the number of outcomes is infinite, and the tools in this section do not apply. <span>Do I want to assign every member of a group of size n to one of n slots (or tasks)? If the answer is yes, use n factorial. If the answer is no, go to the third question. Do I want to count the number of ways to apply one of three or more labels to each member of a group? If the answer is yes

#### Flashcard 1652393381132

Tags
Question
Do I want to count the number of ways to apply one of three or more labels to each member of a group? If the answer is yes, use the [...]
multinomial formula.

status measured difficulty not learned 37% [default] 0

Counting
the tools in this section do not apply. Do I want to assign every member of a group of size n to one of n slots (or tasks)? If the answer is yes, use n factorial. If the answer is no, go to the third question. <span>Do I want to count the number of ways to apply one of three or more labels to each member of a group? If the answer is yes, use the multinomial formula. If the answer is no, go to the fourth question. Do I want to count the number of ways that I can choose r objects from a total of n, when the order in which I list the r

#### Flashcard 1652398361868

Tags
Question
Do I want to count the number of ways I can choose r objects from a total of n, when the order in which I list the r objects is important? If the answer is yes use the [...]
permutation formula applies.

status measured difficulty not learned 37% [default] 0

Counting
al of n, when the order in which I list the r objects does not matter (can I give the r objects a label)? If the answer to these questions is yes, the combination formula applies. If the answer is no, go to the fifth question. <span>Do I want to count the number of ways I can choose r objects from a total of n, when the order in which I list the r objects is important? If the answer is yes, the permutation formula applies. If the answer is no, go to question 6. Can the multiplication rule of counting be used? If it cannot, you may have to count the possibilities one by one, or use more adv

#### Flashcard 1654675344652

Tags
Question
[...] , a computer-based tool for obtaining information on complex problems.
Monte Carlo simulation

status measured difficulty not learned 37% [default] 0

#### Flashcard 1729401589004

Question
By leveraging stochastic processes such as the beta and Dirichlet process (DP), these methods allow the data to drive the complexity of the learned model, while still permit- ting [...]
efficient inference algorithms.

Speculation: efficient is possible because DP or BP have imposed certain structure on them?

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
ad><head> By leveraging stochastic processes such as the beta and Dirichlet process (DP), these methods allow the data to drive the complexity of the learned model, while still permit- ting efficient inference algorithms. <html>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 1729419676940

Question
Almost all machine-learning tasks can be formulated as making inferences about [...] from the observed data
missing or latent data

Data can be understood in the broadest sense. It can be missing data, or model parameters, or even models themselves.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Almost all machine-learning tasks can be formulated as making inferences about missing or latent data from the observed data

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 1729421249804

Question
A model is considered to be well defined if it can [...] about unobserved data (BEFORE) having been trained on observed data
make forecasts or predictions

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 1729425444108

Question
Bayesian optimization poses the question of finding function optima as a problem in [...]
sequential decision theory

For this reason it has great potential in reinforcement learning.

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Annotation 1729484950796

#probability
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs.

Negative binomial distribution - Wikipedia
) 2 ( p ) {\displaystyle {\frac {r}{(1-p)^{2}(p)}}} <span>In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs. For example, if we define a 1 as failure, all non-1s as successes, and we throw a dice repeatedly until the third time 1 appears (r = three failures), then the probability distribution

#### Annotation 1729497271564

#probability
The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p .

Negative binomial distribution - Wikipedia
) . {\displaystyle \operatorname {Poisson} (\lambda )=\lim _{r\to \infty }\operatorname {NB} \left(r,{\frac {\lambda }{\lambda +r}}\right).} Gamma–Poisson mixture[edit source] <span>The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p. To display the intuition behind this statement, consider two independent Poisson processes, “Success” and “Failure”, with intensities p and 1 − p. Together, the Success and Failure pr

#### Annotation 1729501203724

#multivariate-normal-distribution
One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution.

Multivariate normal distribution - Wikipedia
a }}\mathbf {t} {\Big )}} In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. <span>One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly)

#### Annotation 1729503300876

#multivariate-normal-distribution
The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value

Multivariate normal distribution - Wikipedia
e definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. <span>The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value. Contents [hide] 1 Notation and parametrization 2 Definition 3 Properties 3.1 Density function 3.1.1 Non-degenerate case 3.1.2 Degenerate case 3.2 Higher moments 3.3 Lik

#### Annotation 1729504873740

#multivariate-normal-distribution
The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which is the full multivariate distribution and is the product of the 1-dimensional marginal distributions

Multivariate normal distribution - Wikipedia
al {CN}}_{0}\|{\mathcal {CN}}_{1})=\operatorname {tr} \left({\boldsymbol {\Sigma }}_{1}^{-1}{\boldsymbol {\Sigma }}_{0}\right)-k+\ln {|{\boldsymbol {\Sigma }}_{1}| \over |{\boldsymbol {\Sigma }}_{0}|}.} Mutual information[edit source] <span>The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which P {\displaystyle P} is the full multivariate distribution and Q {\displaystyle Q} is the product of the 1-dimensional marginal distributions. In the notation of the Kullback–Leibler divergence section of this article, Σ 1

#### Annotation 1729506970892

#multivariate-normal-distribution

In the bivariate case the expression for the mutual information is: Multivariate normal distribution - Wikipedia
ldsymbol {\rho }}_{0}} is the correlation matrix constructed from Σ 0 {\displaystyle {\boldsymbol {\Sigma }}_{0}} . <span>In the bivariate case the expression for the mutual information is: I ( x ; y ) = − 1 2 ln ⁡ ( 1 − ρ 2 ) . {\displaystyle I(x;y)=-{1 \over 2}\ln(1-\rho ^{2}).} Cumulative distribution function[edit source] The notion of cumulative distribution function (cdf) in dimension 1 can be extended in two ways to the multidimensional case, based

#### Annotation 1729511165196

#linear-algebra
In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup.

Generalized inverse - Wikipedia
ree encyclopedia Jump to: navigation, search "Pseudoinverse" redirects here. For the Moore–Penrose inverse, sometimes referred to as "the pseudoinverse", see Moore–Penrose inverse. <span>In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix A {\displaystyle A} . Formally, given a matrix A ∈

#### Annotation 1729513262348

#linear-algebra
Formally, given a matrix and a matrix , A is a generalized inverse of if it satisfies the condition : .

Generalized inverse - Wikipedia
nverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix A {\displaystyle A} . <span>Formally, given a matrix A ∈ R n × m {\displaystyle A\in \mathbb {R} ^{n\times m}} and a matrix A g ∈ R m × n {\displaystyle A^{\mathrm {g} }\in \mathbb {R} ^{m\times n}} , A g {\displaystyle A^{\mathrm {g} }} is a generalized inverse of A {\displaystyle A} if it satisfies the condition A A g A = A {\displaystyle AA^{\mathrm {g} }A=A} .    The purpose of constructing a generalized inverse of a matrix is to obtain a matrix that can serve as an inverse in some sense for a wider class of matrices than invertibl

#### Annotation 1729516145932

#matrix-inversion
A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution

Moore–Penrose inverse - Wikipedia
tegral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse. <span>A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the

#### Annotation 1729518243084

#matrix-inversion

The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.

Moore–Penrose inverse - Wikipedia
tion (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra. <span>The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition. Contents [hide] 1 Notation 2 Definition 3 Properties 3.1 Existence and uniqueness 3.2 Basic properties 3.2.1 Identities 3.3 Reduction to Hermitian case 3.4 Products 3.5

#### Annotation 1729520602380

#matrix-inversion

For , a pseudoinverse of is defined as a matrix satisfying all of the following four criteria:

1. ( AA+ need not be the general identity matrix, but it maps all column vectors of A to themselves);
2. ( A+ is a weak inverse for the multiplicative semigroup);
3. ( AA+ is Hermitian); and
4. ( A+A is also Hermitian).

Moore-Penrose Pseudo-inverse exists for any matrix , but when the latter has full rank, can be expressed as a simple algebraic formula.

In particular, when has linearly independent columns (and thus matrix is invertible), can be computed as:

...

Moore–Penrose inverse - Wikipedia
; K ) {\displaystyle I_{n}\in \mathrm {M} (n,n;K)} denotes the n × n {\displaystyle n\times n} identity matrix. Definition[edit source] <span>For A ∈ M ( m , n ; K ) {\displaystyle A\in \mathrm {M} (m,n;K)} , a pseudoinverse of A {\displaystyle A} is defined as a matrix A + ∈ M ( n , m ; K ) {\displaystyle A^{+}\in \mathrm {M} (n,m;K)} satisfying all of the following four criteria:   A A + A = A {\displaystyle AA^{+}A=A\,\!} (AA + need not be the general identity matrix, but it maps all column vectors of A to themselves); A + A A + = A + {\displaystyle A^{+}AA^{+}=A^{+}\,\!} (A + is a weak inverse for the multiplicative semigroup); ( A A + ) ∗ = A A + {\displaystyle (AA^{+})^{*}=AA^{+}\,\!} (AA + is Hermitian); and ( A + A ) ∗ = A + A {\displaystyle (A^{+}A)^{*}=A^{+}A\,\!} (A + A is also Hermitian). A + {\displaystyle A^{+}} exists for any matrix A {\displaystyle A} , but when the latter has full rank, A + {\displaystyle A^{+}} can be expressed as a simple algebraic formula. In particular, when A {\displaystyle A} has linearly independent columns (and thus matrix A ∗ A {\displaystyle A^{*}A} is invertible), A + {\displaystyle A^{+}} can be computed as: A + = ( A ∗ A ) − 1 A ∗ . {\displaystyle A^{+}=(A^{*}A)^{-1}A^{*}\,.} This particular pseudoinverse constitutes a left inverse, since, in this case, A + A = I {\displaystyle A^{+}A=I} . When A {\displaystyle A} has linearly independent rows (matrix A A ∗ {\displaystyle AA^{*}} is invertible), A + {\displaystyle A^{+}} can be computed as: A + = A ∗ ( A A ∗ ) − 1 . {\displaystyle A^{+}=A^{*}(AA^{*})^{-1}\,.} This is a right inverse, as A A + = I {\displaystyle AA^{+}=I} . Properties[edit source] Proofs for some of these facts may be found on a separate page Proofs involving the Moore–Penrose inverse. Existence and uniqueness[edit source] The pseu

#### Annotation 1729523485964

#multivariate-normal-distribution
Conditional distributions

If N-dimensional x is partitioned as follows and accordingly μ and Σ are partitioned as follows  then the distribution of x1 conditional on x2 = a is multivariate normal (x1 | x2 = a) ~ N( μ , Σ ) where and covariance matrix This matrix is the Schur complement of Σ22 in Σ. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional covariance matrix. Here is the generalized inverse of .

Note that knowing that x2 = a alters the variance, though the new variance does not depend on the specific value of a; perhaps more surprisingly, the mean is shifted by ; compare this with the situation of not knowing the value of a, in which case x1 would have distribution .

An interesting fact derived in order to prove this result, is that the random vectors and are independent.

The matrix Σ12Σ22−1 is known as the matrix of

...

Multivariate normal distribution - Wikipedia
y two or more of its components that are pairwise independent are independent. But, as pointed out just above, it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. <span>Conditional distributions[edit source] If N-dimensional x is partitioned as follows x = [ x 1 x 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\$$N-q)\times 1\end{bmatrix}}} and accordingly μ and Σ are partitioned as follows μ = [ μ 1 μ 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\boldsymbol {\mu }}={\begin{bmatrix}{\boldsymbol {\mu }}_{1}\\{\boldsymbol {\mu }}_{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}} Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] with sizes [ q × q q × ( N − q ) ( N − q ) × q ( N − q ) × ( N − q ) ] {\boldsymbol {\Sigma }}={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{12}\\{\boldsymbol {\Sigma }}_{21}&{\boldsymbol {\Sigma }}_{22}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times q&q\times (N-q)\\(N-q)\times q&(N-q)\times (N-q)\end{bmatrix}}} then the distribution of x 1 conditional on x 2 = a is multivariate normal (x 1 | x 2 = a) ~ N(μ, Σ) where μ ¯ = μ 1 + Σ 12 Σ 22 − 1 ( a − μ 2 ) {\bar {\boldsymbol {\mu }}}={\boldsymbol {\mu }}_{1}+{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} and covariance matrix Σ ¯ = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 . {\overline {\boldsymbol {\Sigma }}}={\boldsymbol {\Sigma }}_{11}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}{\boldsymbol {\Sigma }}_{21}.}  This matrix is the Schur complement of Σ 22 in Σ. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional covariance matrix. Here Σ 22 − 1 {\boldsymbol {\Sigma }}_{22}^{-1}} is the generalized inverse of Σ 22 {\boldsymbol {\Sigma }}_{22}} . Note that knowing that x 2 = a alters the variance, though the new variance does not depend on the specific value of a; perhaps more surprisingly, the mean is shifted by Σ 12 Σ 22 − 1 ( a − μ 2 ) {\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} ; compare this with the situation of not knowing the value of a, in which case x 1 would have distribution N q ( μ 1 , Σ 11 ) {\mathcal {N}}_{q}\left({\boldsymbol {\mu }}_{1},{\boldsymbol {\Sigma }}_{11}\right)} . An interesting fact derived in order to prove this result, is that the random vectors x 2 \mathbf {x} _{2}} and y 1 = x 1 − Σ 12 Σ 22 − 1 x 2 \mathbf {y} _{1}=\mathbf {x} _{1}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\mathbf {x} _{2}} are independent. The matrix Σ 12 Σ 22 −1 is known as the matrix of regression coefficients. Bivariate case[edit source] In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is  #### Annotation 1729525845260 #multivariate-normal-distribution To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra. status not read Multivariate normal distribution - Wikipedia ) \operatorname {E} (X_{1}\mid X_{2}##BAD TAG##\rho E(X_{2}\mid X_{2}##BAD TAG##} and then using the properties of the expectation of a truncated normal distribution. Marginal distributions[edit source] <span>To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra.  Example Let X = [X 1 , X 2 , X 3 ] be multivariate normal random variables with mean vector μ = [μ 1 , μ 2 , μ 3 ] and covariance matrix Σ (standard parametrization for multivariate #### Annotation 1729527942412 #multivariate-normal-distribution If Y = c + BX is an affine transformation of where c is an vector of constants and B is a constant matrix, then Y has a multivariate normal distribution with expected value c + and variance BΣBT . Corollaries: sums of Gaussian are Gaussian, marginals of Gaussian are Gaussian. status not read Multivariate normal distribution - Wikipedia {\boldsymbol {\Sigma }}'={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{13}\\{\boldsymbol {\Sigma }}_{31}&{\boldsymbol {\Sigma }}_{33}\end{bmatrix}}} . Affine transformation[edit source] <span>If Y = c + BX is an affine transformation of X ∼ N ( μ , Σ ) , \mathbf {X} \ \sim {\mathcal {N}}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}),} where c is an M × 1 M\times 1} vector of constants and B is a constant M × N M\times N} matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance BΣB T i.e., Y ∼ N ( c + B μ , B Σ B T ) \mathbf {Y} \sim {\mathcal {N}}\left(\mathbf {c} +\mathbf {B} {\boldsymbol {\mu }},\mathbf {B} {\boldsymbol {\Sigma }}\mathbf {B} ^{\rm {T}}\right)} . In particular, any subset of the X i has a marginal distribution that is also multivariate normal. To see this, consider the following example: to extract the subset (X 1 , X 2 , X 4 ) #### Annotation 1729530039564 #multivariate-normal-distribution The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean. status not read Multivariate normal distribution - Wikipedia implies that the variance of the dot product must be positive. An affine transformation of X such as 2X is not the same as the sum of two independent realisations of X. Geometric interpretation[edit source] See also: Confidence region <span>The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.  Hence the multivariate normal distribution is an example of the class of elliptical distributions. The directions of the principal axes of the ellipsoids are given by the eigenvec #### Annotation 1729532136716 #multivariate-normal-distribution The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. status not read Multivariate normal distribution - Wikipedia urs of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.  Hence the multivariate normal distribution is an example of the class of elliptical distributions. <span>The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. If Σ = UΛU T = UΛ 1/2 (UΛ 1/2 ) T is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal matrix of the eigenvalues, then we have #### Annotation 1729534496012 #multivariate-normal-distribution The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ1/2, rotated by U and translated by μ. status not read Multivariate normal distribution - Wikipedia {\mu }}+\mathbf {U} {\mathcal {N}}(0,{\boldsymbol {\Lambda }}).} Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on N(0, Λ), but inverting a column changes the sign of U's determinant. <span>The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ 1/2 , rotated by U and translated by μ. Conversely, any choice of μ, full rank matrix U, and positive diagonal entries Λ i yields a non-singular multivariate normal distribution. If any Λ i is zero and U is square, the re #### Annotation 1729537903884 #geometry An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation. status not read Ellipsoid - Wikipedia = 1 : {x^{2} \over a^{2}}+{y^{2} \over b^{2}}+{z^{2} \over c^{2}}=1:} sphere (top, a=b=c=4), spheroid (bottom left, a=b=5, c=3), tri-axial ellipsoid (bottom right, a=4.5, b=6, c=3) <span>An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation. An ellipsoid is a quadric surface, that is a surface that may be defined as the zero set of a polynomial of degree two in three variables. Among quadric surfaces, an ellipsoid is char #### Annotation 1729541311756 #topology In geometry, an affine transformation, affine map or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. status not read Affine transformation - Wikipedia s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination #### Annotation 1729544719628 #linear-algebra #matrix-decomposition In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. status not read Eigendecomposition of a matrix - Wikipedia | ocultar ahora Eigendecomposition of a matrix From Wikipedia, the free encyclopedia (Redirected from Eigendecomposition) Jump to: navigation, search <span>In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. Contents [hide] 1 Fundamental theory of matrix eigenvectors and eigenvalues 2 Eigendecomposition of a matrix 2.1 Example 2.2 Matrix inverse via eigendecomposition 2.2.1 Pr #### Annotation 1729546816780 #linear-algebra #matrix-decomposition The eigendecomposition can be derived from the fundamental property of eigenvectors: and thus which yields . status not read Eigendecomposition of a matrix - Wikipedia , v_{i}\,\,(i=1,\dots ,N),} can also be used as the columns of Q. That can be understood by noting that the magnitude of the eigenvectors in Q gets canceled in the decomposition by the presence of Q −1 . <span>The decomposition can be derived from the fundamental property of eigenvectors: A v = λ v \mathbf {A} \mathbf {v} =\lambda \mathbf {v} } and thus A Q = Q Λ \mathbf {A} \mathbf {Q} =\mathbf {Q} \mathbf {\Lambda } } which yields A = Q Λ Q − 1 \mathbf {A} =\mathbf {Q} \mathbf {\Lambda } \mathbf {Q} ^{-1}} . Example[edit source] Taking a 2 × 2 real matrix A = [ #### Annotation 1729550224652 #gaussian-process Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. status not read Gaussian process - Wikipedia f them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space. <span>Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point, but also has uncertainty information—it is a one-dimensional Gaussian distribution (which is the marginal distribution at that poi #### Annotation 1729553632524 #fourier-analysis n probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. status not read Characteristic function (probability theory) - Wikipedia aracteristic function of a uniform U(–1,1) random variable. This function is real-valued because it corresponds to a random variable that is symmetric around the origin; however characteristic functions may generally be complex-valued. I<span>n probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides the basis of #### Annotation 1729555729676 #fourier-analysis If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. status not read Characteristic function (probability theory) - Wikipedia c around the origin; however characteristic functions may generally be complex-valued. In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. <span>If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There ar #### Annotation 1729558613260 The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up status not read Fourier transform - Wikipedia ctions for the group of translations. Fourier transforms Continuous Fourier transform Fourier series Discrete-time Fourier transform Discrete Fourier transform Discrete Fourier transform over a ring Fourier analysis Related transforms <span>The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up, in a way similar to how a musical chord can be expressed as the frequencies (or pitches) of its constituent notes. The Fourier transform of a function of time itself is a complex-value #### Annotation 1729560710412 The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up status not read Fourier transform - Wikipedia ctions for the group of translations. Fourier transforms Continuous Fourier transform Fourier series Discrete-time Fourier transform Discrete Fourier transform Discrete Fourier transform over a ring Fourier analysis Related transforms <span>The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up, in a way similar to how a musical chord can be expressed as the frequencies (or pitches) of its constituent notes. The Fourier transform of a function of time itself is a complex-value #### Annotation 1729562283276 #gaussian-process A key fact of Gaussian processes is that they can be completely defined by their second-order statistics. status not read Gaussian process - Wikipedia μ ℓ \mu _{\ell }} can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] <span>A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of t #### Annotation 1729564380428 #gaussian-process if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. status not read Gaussian process - Wikipedia } can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, <span>if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   Stationarity refers to the process' beha #### Annotation 1729566477580 #gaussian-process Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the process is stationary, it depends on their separation, xx', while if non-stationary it depends on the actual position of the points x and x'. status not read Gaussian process - Wikipedia initeness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   <span>Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the process is stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. If the process depends only on |x − x'|, the Euclidean distance (not the dire #### Annotation 1729568574732 #gaussian-process If the process depends only on |xx'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous; in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. status not read Gaussian process - Wikipedia stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne #### Annotation 1729571196172 #gaussian-process If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of continuity is present. status not read Gaussian process - Wikipedia r the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  <span>If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of continuity is present. If we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and #### Annotation 1729573293324 #gaussian-process Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)). status not read Gaussian process - Wikipedia en we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential where the former is never differentiable and the latter infinitely differentiable. <span>Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)). Usual covariance functions[edit source] [imagelink] The effect of choosing different kernels on the prior function distribution of the Gaussian process. Left is a squared expon #### Annotation 1729575652620 #gaussian-process Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand. status not read Gaussian process - Wikipedia \nu } and Γ ( ν ) \Gamma (\nu )} is the gamma function evaluated at ν \nu } . <span>Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand. Clearly, the inferential results are dependent on the values of the hyperparameters θ (e.g. ℓ and σ) defining the model's behaviour. A popular choice for θ is to provide maximum a pos #### Annotation 1729578011916 #multivariate-normal-distribution In the bivariate case where x is partitioned into X1 and X2, the conditional distribution of X1 given X2 is where is the correlation coefficient between X1 and X2. status not read Multivariate normal distribution - Wikipedia mathbf {y} _{1}=\mathbf {x} _{1}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\mathbf {x} _{2}} are independent. The matrix Σ 12 Σ 22 −1 is known as the matrix of regression coefficients. Bivariate case[edit source] <span>In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is  X 1 ∣ X 2 = x 2 ∼ N ( μ 1 + σ 1 σ 2 ρ ( x 2 − μ 2 ) , ( 1 − ρ 2 ) σ 1 2 ) . X_{1}\mid X_{2}=x_{2}\ \sim \ {\mathcal {N}}\left(\mu _{1}+{\frac {\sigma _{1}}{\sigma _{2}}}\rho (x_{2}-\mu _{2}),\,(1-\rho ^{2})\sigma _{1}^{2}\right).} where ρ \rho } is the correlation coefficient between X 1 and X 2 . Bivariate conditional expectation[edit source] In the general case[edit source] ( #### Annotation 1729581419788 In the Indian buffet process, the rows of correspond to customers and the columns correspond to dishes in an infinitely long buffet. status not read Indian buffet process - Wikipedia among the columns in Z Z} . The parameter α \alpha } controls the expected number of features present in each observation. <span>In the Indian buffet process, the rows of Z Z} correspond to customers and the columns correspond to dishes in an infinitely long buffet. The first customer takes the first P o i s s o n ( α ) #### Annotation 1729584827660 In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers: status not read Harmonic number - Wikipedia γ + ln ⁡ ( x ) \gamma +\ln(x)} (blue line) where γ \gamma } is the Euler–Mascheroni constant. <span>In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers: H n = 1 + 1 2 + 1 3 + ⋯ + 1 n = ∑ k = 1 n 1 k . H_{n}=1+{\frac {1}{2}}+{\frac {1}{3}}+\cdots +{\frac {1}{n}}=\sum _{k=1}^{n}{\frac {1}{k}}.} Harmonic numbers are related to the harmonic mean in that the n-th harmonic number is also n times the reciprocal of the harmonic mean of the first n positive integers. Harmonic #### Annotation 1729588497676 A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. status not read Stationary process - Wikipedia rting. In the latter case of a deterministic trend, the process is called a trend stationary process, and stochastic shocks have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean. <span>A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is #### Annotation 1729590594828 processes with one or more unit roots can be made stationary through differencing. status not read Stationary process - Wikipedia deterministically evolving (non-constant) mean. A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, <span>processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is a cyclostationary process, which is a stochastic process that varies cyclically with time. #### Annotation 1729593478412 #finance In finance, mean reversion is the assumption that a stock's price will tend to move to the average price over time. status not read Mean reversion (finance) - Wikipedia ocultar ahora Mean reversion (finance) From Wikipedia, the free encyclopedia Jump to: navigation, search For other uses, see Mean reversion (disambiguation). <span>In finance, mean reversion is the assumption that a stock's price will tend to move to the average price over time.   Using mean reversion in stock price analysis involves both identifying the trading range for a stock and computing the average price using analytical techniques taking into ac #### Flashcard 1729595837708 Tags #finance Question In finance, [...] is the assumption that a stock's price will tend to move to the average price over time. Answer mean reversion status measured difficulty not learned 37% [default] 0 #### Parent (intermediate) annotation Open it In finance, mean reversion is the assumption that a stock's price will tend to move to the average price over time. #### Original toplevel document Mean reversion (finance) - Wikipedia ocultar ahora Mean reversion (finance) From Wikipedia, the free encyclopedia Jump to: navigation, search For other uses, see Mean reversion (disambiguation). <span>In finance, mean reversion is the assumption that a stock's price will tend to move to the average price over time.   Using mean reversion in stock price analysis involves both identifying the trading range for a stock and computing the average price using analytical techniques taking into ac #### Flashcard 1729597410572 Question processes with one or more unit roots can be made stationary through [...]. Answer differencing status measured difficulty not learned 37% [default] 0 #### Parent (intermediate) annotation Open it processes with one or more unit roots can be made stationary through differencing. #### Original toplevel document Stationary process - Wikipedia deterministically evolving (non-constant) mean. A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, <span>processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is a cyclostationary process, which is a stochastic process that varies cyclically with time. #### Flashcard 1729598983436 Question A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by [...], which is solely a function of time. Answer removing the underlying trend This operation sounds too frequentist.. The uncertainty related to the trend is imposed on the trend removed process. Refer to Jaynes later. status measured difficulty not learned 37% [default] 0 #### Parent (intermediate) annotation Open it A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. #### Original toplevel document Stationary process - Wikipedia rting. In the latter case of a deterministic trend, the process is called a trend stationary process, and stochastic shocks have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean. <span>A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is #### Flashcard 1729601342732 Question the [...] is defined as Answer n-th harmonic number status measured difficulty not learned 37% [default] 0 #### Parent (intermediate) annotation Open it In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers: #### Original toplevel document Harmonic number - Wikipedia γ + ln ⁡ ( x ) \gamma +\ln(x)} (blue line) where γ \gamma } is the Euler–Mascheroni constant. <span>In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers: H n = 1 + 1 2 + 1 3 + ⋯ + 1 n = ∑ k = 1 n 1 k . H_{n}=1+{\frac {1}{2}}+{\frac {1}{3}}+\cdots +{\frac {1}{n}}=\sum _{k=1}^{n}{\frac {1}{k}}.} Harmonic numbers are related to the harmonic mean in that the n-th harmonic number is also n times the reciprocal of the harmonic mean of the first n positive integers. Harmonic #### Flashcard 1729603702028 Question In the Indian buffet process, the rows of correspond to [...] and the columns correspond to [...] in an infinitely long buffet. Answer customers, dishes status measured difficulty not learned 37% [default] 0 #### Parent (intermediate) annotation Open it In the Indian buffet process, the rows of correspond to customers and the columns correspond to dishes in an infinitely long buffet. #### Original toplevel document Indian buffet process - Wikipedia among the columns in Z Z} . The parameter α \alpha } controls the expected number of features present in each observation. <span>In the Indian buffet process, the rows of Z Z} correspond to customers and the columns correspond to dishes in an infinitely long buffet. The first customer takes the first P o i s s o n ( α ) #### Flashcard 1729607109900 Tags #multivariate-normal-distribution Question In the bivariate case, the conditional mean of X1 given X2 is [...] Answer \( \mu_1 + \frac{\sigma_1}{\sigma_2} \rho (x_2 - \mu_2)$$

where is the correlation coefficient between X1 and X2.
Apparently both the correlation and variance should play a part!

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is where is the correlation coefficient between X 1 and X 2 .

#### Original toplevel document

Multivariate normal distribution - Wikipedia
mathbf {y} _{1}=\mathbf {x} _{1}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\mathbf {x} _{2}} are independent. The matrix Σ 12 Σ 22 −1 is known as the matrix of regression coefficients. Bivariate case[edit source] <span>In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is  X 1 ∣ X 2 = x 2 ∼ N ( μ 1 + σ 1 σ 2 ρ ( x 2 − μ 2 ) , ( 1 − ρ 2 ) σ 1 2 ) . {\displaystyle X_{1}\mid X_{2}=x_{2}\ \sim \ {\mathcal {N}}\left(\mu _{1}+{\frac {\sigma _{1}}{\sigma _{2}}}\rho (x_{2}-\mu _{2}),\,(1-\rho ^{2})\sigma _{1}^{2}\right).} where ρ {\displaystyle \rho } is the correlation coefficient between X 1 and X 2 . Bivariate conditional expectation[edit source] In the general case[edit source] (

#### Flashcard 1729609469196

Tags
#gaussian-process
Question
Importantly, a complicated covariance function can be defined as a [...] of other simpler covariance functions in order to incorporate different insights about the data-set at hand.
linear combination

Perhaps more than linear combination.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand.

#### Original toplevel document

Gaussian process - Wikipedia
{\displaystyle \nu } and Γ ( ν ) {\displaystyle \Gamma (\nu )} is the gamma function evaluated at ν {\displaystyle \nu } . <span>Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand. Clearly, the inferential results are dependent on the values of the hyperparameters θ (e.g. ℓ and σ) defining the model's behaviour. A popular choice for θ is to provide maximum a pos

#### Flashcard 1729611042060

Tags
#gaussian-process
Question

Periodicity maps the input x to a two dimensional vector [...]

u(x) = (cos(x), sin(x)).

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)).

#### Original toplevel document

Gaussian process - Wikipedia
en we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential where the former is never differentiable and the latter infinitely differentiable. <span>Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)). Usual covariance functions[edit source] [imagelink] The effect of choosing different kernels on the prior function distribution of the Gaussian process. Left is a squared expon

#### Flashcard 1729612614924

Tags
#gaussian-process
Question
If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of [...] is present.
continuity

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of continuity is present.

#### Original toplevel document

Gaussian process - Wikipedia
r the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  <span>If we expect that for "near-by" input points x and x' their corresponding output points y and y' to be "near-by" also, then the assumption of continuity is present. If we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and

#### Annotation 1729614187788

#gaussian-process
If the process depends only on |xx'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic.

#### Parent (intermediate) annotation

Open it
If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the be

#### Original toplevel document

Gaussian process - Wikipedia
stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne

#### Flashcard 1729615760652

Tags
#gaussian-process
Question
isotropic process depend only on distance, not [...]
direction

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic.

#### Original toplevel document

Gaussian process - Wikipedia
stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne

#### Flashcard 1729617333516

Tags
#gaussian-process
Question
a stochastic process is called [...] if it depends only on distance but not the direction
isotropic

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic.

#### Original toplevel document

Gaussian process - Wikipedia
stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne

#### Annotation 1729618906380

#gaussian-process
A process that is concurrently stationary and isotropic is considered to be homogeneous; in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer.

#### Parent (intermediate) annotation

Open it
If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer.

#### Original toplevel document

Gaussian process - Wikipedia
stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne

#### Flashcard 1729620479244

Tags
#gaussian-process
Question
homogeneous process behaves the same regardless the location of [...].
the observer

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of <span>the observer. <span><body><html>

#### Original toplevel document

Gaussian process - Wikipedia
stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. <span>If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x', then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous;  in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer. Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function.  If we expect that for "ne

#### Flashcard 1729622052108

Tags
#gaussian-process
Question
Stationarity process' behaviour depends on the distance between points, not their [...].
the actual positions

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
y> Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the process is stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. <body><html>

#### Original toplevel document

Gaussian process - Wikipedia
initeness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   <span>Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the process is stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual position of the points x and x'. For example, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is stationary. If the process depends only on |x − x'|, the Euclidean distance (not the dire

#### Annotation 1729623624972

#gaussian-process
if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour.

#### Parent (intermediate) annotation

Open it
if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion.

#### Original toplevel document

Gaussian process - Wikipedia
} can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, <span>if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   Stationarity refers to the process' beha

#### Flashcard 1729625197836

Tags
#gaussian-process
Question
if a Gaussian process is assumed to have mean zero, defining [...] completely defines the process' behaviour.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour.

#### Original toplevel document

Gaussian process - Wikipedia
} can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, <span>if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   Stationarity refers to the process' beha

#### Flashcard 1729626770700

Tags
#gaussian-process
Question
In Gaussian process, the non-negative definiteness of the covariance function enables its [...] using the Karhunen–Loeve expansion.
spectral decomposition

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
head> if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. <html>

#### Original toplevel document

Gaussian process - Wikipedia
} can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, <span>if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.   Stationarity refers to the process' beha

#### Flashcard 1729628343564

Tags
#gaussian-process
Question
Gaussian processes can be completely defined by their [...].
second-order statistics

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.

#### Original toplevel document

Gaussian process - Wikipedia
μ ℓ {\displaystyle \mu _{\ell }} can be shown to be the covariances and means of the variables in the process.  Covariance functions[edit source] <span>A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.  Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of t

#### Flashcard 1729630178572

Question
The Fourier transform (FT) decomposes a function of time (a signal) into [...] that make it up
the frequencies

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up

#### Original toplevel document

Fourier transform - Wikipedia
ctions for the group of translations. Fourier transforms Continuous Fourier transform Fourier series Discrete-time Fourier transform Discrete Fourier transform Discrete Fourier transform over a ring Fourier analysis Related transforms <span>The Fourier transform (FT) decomposes a function of time (a signal) into the frequencies that make it up, in a way similar to how a musical chord can be expressed as the frequencies (or pitches) of its constituent notes. The Fourier transform of a function of time itself is a complex-value

#### Flashcard 1729631751436

Tags
#fourier-analysis
Question
If a random variable admits a probability density function, then the [...] is the Fourier transform of the probability density function.
characteristic function

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function.

#### Original toplevel document

Characteristic function (probability theory) - Wikipedia
c around the origin; however characteristic functions may generally be complex-valued. In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. <span>If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There ar

#### Flashcard 1729633324300

Tags
#fourier-analysis
Question
the characteristic function of any real-valued random variable completely defines its [...].

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
n probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution.

#### Original toplevel document

Characteristic function (probability theory) - Wikipedia
aracteristic function of a uniform U(–1,1) random variable. This function is real-valued because it corresponds to a random variable that is symmetric around the origin; however characteristic functions may generally be complex-valued. I<span>n probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides the basis of

#### Flashcard 1729634897164

Tags
#gaussian-process
Question
Viewed as a machine-learning algorithm, a Gaussian process uses [...] and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data.

#### Original toplevel document

Gaussian process - Wikipedia
f them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space. <span>Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point, but also has uncertainty information—it is a one-dimensional Gaussian distribution (which is the marginal distribution at that poi

#### Flashcard 1729636470028

Tags
#gaussian-process
Question
Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of [...] to predict the value for an unseen point from training data.
the similarity between points

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data.

#### Original toplevel document

Gaussian process - Wikipedia
f them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space. <span>Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point, but also has uncertainty information—it is a one-dimensional Gaussian distribution (which is the marginal distribution at that poi

#### Flashcard 1729641712908

Tags
#linear-algebra #matrix-decomposition
Question
The eigendecomposition can be derived from [...] and thus which yields .
the fundamental property of eigenvectors: status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The eigendecomposition can be derived from the fundamental property of eigenvectors: and thus which yields .

#### Original toplevel document

Eigendecomposition of a matrix - Wikipedia
, {\displaystyle v_{i}\,\,(i=1,\dots ,N),} can also be used as the columns of Q. That can be understood by noting that the magnitude of the eigenvectors in Q gets canceled in the decomposition by the presence of Q −1 . <span>The decomposition can be derived from the fundamental property of eigenvectors: A v = λ v {\displaystyle \mathbf {A} \mathbf {v} =\lambda \mathbf {v} } and thus A Q = Q Λ {\displaystyle \mathbf {A} \mathbf {Q} =\mathbf {Q} \mathbf {\Lambda } } which yields A = Q Λ Q − 1 {\displaystyle \mathbf {A} =\mathbf {Q} \mathbf {\Lambda } \mathbf {Q} ^{-1}} . Example[edit source] Taking a 2 × 2 real matrix A = [

#### Flashcard 1729643285772

Tags
#linear-algebra #matrix-decomposition
Question
The eigendecomposition decomposes matrix A to [...] .

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The eigendecomposition can be derived from the fundamental property of eigenvectors: and thus which yields .

#### Original toplevel document

Eigendecomposition of a matrix - Wikipedia
, {\displaystyle v_{i}\,\,(i=1,\dots ,N),} can also be used as the columns of Q. That can be understood by noting that the magnitude of the eigenvectors in Q gets canceled in the decomposition by the presence of Q −1 . <span>The decomposition can be derived from the fundamental property of eigenvectors: A v = λ v {\displaystyle \mathbf {A} \mathbf {v} =\lambda \mathbf {v} } and thus A Q = Q Λ {\displaystyle \mathbf {A} \mathbf {Q} =\mathbf {Q} \mathbf {\Lambda } } which yields A = Q Λ Q − 1 {\displaystyle \mathbf {A} =\mathbf {Q} \mathbf {\Lambda } \mathbf {Q} ^{-1}} . Example[edit source] Taking a 2 × 2 real matrix A = [

#### Flashcard 1729646955788

Tags
#linear-algebra #matrix-decomposition
Question

[...] is the factorization of a matrix into a canonical form

eigendecomposition

Also called spectral decomposition

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonal

#### Original toplevel document

Eigendecomposition of a matrix - Wikipedia
| ocultar ahora Eigendecomposition of a matrix From Wikipedia, the free encyclopedia (Redirected from Eigendecomposition) Jump to: navigation, search <span>In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. Contents [hide] 1 Fundamental theory of matrix eigenvectors and eigenvalues 2 Eigendecomposition of a matrix 2.1 Example 2.2 Matrix inverse via eigendecomposition 2.2.1 Pr

#### Flashcard 1729648528652

Tags
#linear-algebra #matrix-decomposition
Question

eigendecomposition is sometimes also called [...]

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in

#### Original toplevel document

Eigendecomposition of a matrix - Wikipedia
| ocultar ahora Eigendecomposition of a matrix From Wikipedia, the free encyclopedia (Redirected from Eigendecomposition) Jump to: navigation, search <span>In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. Contents [hide] 1 Fundamental theory of matrix eigenvectors and eigenvalues 2 Eigendecomposition of a matrix 2.1 Example 2.2 Matrix inverse via eigendecomposition 2.2.1 Pr

#### Flashcard 1729650101516

Tags
#linear-algebra #matrix-decomposition
Question

eigendecomposition factorises a matrix into a [...] form

canonical

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way.

#### Original toplevel document

Eigendecomposition of a matrix - Wikipedia
| ocultar ahora Eigendecomposition of a matrix From Wikipedia, the free encyclopedia (Redirected from Eigendecomposition) Jump to: navigation, search <span>In linear algebra, eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. Contents [hide] 1 Fundamental theory of matrix eigenvectors and eigenvalues 2 Eigendecomposition of a matrix 2.1 Example 2.2 Matrix inverse via eigendecomposition 2.2.1 Pr

#### Annotation 1729651674380

#topology
In geometry, an affine transformation, affine map or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes.

#### Parent (intermediate) annotation

Open it
In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, tho

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Flashcard 1729653247244

Tags
#topology
Question
In geometry, an affine transformation preserves [...objects...].
points, straight lines and planes

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes.

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Annotation 1729654820108

#topology
An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.

#### Parent (intermediate) annotation

Open it
map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. <span>An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. <span><body><html>

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Flashcard 1729656392972

Tags
#topology
Question
affine transformation does not necessarily preserve [...] between lines
angles

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Flashcard 1729657965836

Tags
#topology
Question
affine transformation does not necessarily preserve [...] between points
distances

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Flashcard 1729659538700

Tags
#topology
Question
An affine transformation preserve [...] between points lying on a straight line.
ratios of distances

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.

#### Original toplevel document

Affine transformation - Wikipedia
s related to each other leaf by an affine transformation. For instance, the red leaf can be transformed into both the small dark blue leaf and the large light blue leaf by a combination of reflection, rotation, scaling, and translation. <span>In geometry, an affine transformation, affine map  or an affinity (from the Latin, affinis, "connected with") is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line. Examples of affine transformations include translation, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions of them in any combination

#### Flashcard 1729661111564

Tags
#geometry
Question
An [...] may be obtained by deforming a sphere with an affine transformation .
ellipsoid

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation.

#### Original toplevel document

Ellipsoid - Wikipedia
= 1 : {\displaystyle {x^{2} \over a^{2}}+{y^{2} \over b^{2}}+{z^{2} \over c^{2}}=1:} sphere (top, a=b=c=4), spheroid (bottom left, a=b=5, c=3), tri-axial ellipsoid (bottom right, a=4.5, b=6, c=3) <span>An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation. An ellipsoid is a quadric surface, that is a surface that may be defined as the zero set of a polynomial of degree two in three variables. Among quadric surfaces, an ellipsoid is char

#### Flashcard 1729662684428

Tags
#geometry
Question
An ellipsoid may be obtained by deforming a sphere with an [...].

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation.

#### Original toplevel document

Ellipsoid - Wikipedia
= 1 : {\displaystyle {x^{2} \over a^{2}}+{y^{2} \over b^{2}}+{z^{2} \over c^{2}}=1:} sphere (top, a=b=c=4), spheroid (bottom left, a=b=5, c=3), tri-axial ellipsoid (bottom right, a=4.5, b=6, c=3) <span>An ellipsoid is a surface that may be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation. An ellipsoid is a quadric surface, that is a surface that may be defined as the zero set of a polynomial of degree two in three variables. Among quadric surfaces, an ellipsoid is char

#### Flashcard 1729664257292

Tags
#multivariate-normal-distribution
Question
The distribution N(μ, Σ) is in effect N(0, I) scaled by [...] , rotated by [...] and translated by [...] .
Λ1/2, U , μ.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ 1/2 , rotated by U and translated by μ.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
{\mu }}+\mathbf {U} {\mathcal {N}}(0,{\boldsymbol {\Lambda }}).} Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on N(0, Λ), but inverting a column changes the sign of U's determinant. <span>The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ 1/2 , rotated by U and translated by μ. Conversely, any choice of μ, full rank matrix U, and positive diagonal entries Λ i yields a non-singular multivariate normal distribution. If any Λ i is zero and U is square, the re

#### Flashcard 1729666616588

Tags
#multivariate-normal-distribution
Question
The directions of the principal axes of the ellipsoids are given by [...] of the covariance matrix Σ
the eigenvectors

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
urs of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.  Hence the multivariate normal distribution is an example of the class of elliptical distributions. <span>The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. If Σ = UΛU T = UΛ 1/2 (UΛ 1/2 ) T is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal matrix of the eigenvalues, then we have

#### Flashcard 1729668189452

Tags
#multivariate-normal-distribution
Question
[...] of the principal axes are given by the corresponding eigenvalues.
The squared relative lengths

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
urs of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.  Hence the multivariate normal distribution is an example of the class of elliptical distributions. <span>The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. If Σ = UΛU T = UΛ 1/2 (UΛ 1/2 ) T is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal matrix of the eigenvalues, then we have

#### Flashcard 1729669762316

Tags
#multivariate-normal-distribution
Question
The equidensity contours of a non-singular multivariate normal distribution are [...] centered at the mean.
ellipsoids

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
implies that the variance of the dot product must be positive. An affine transformation of X such as 2X is not the same as the sum of two independent realisations of X. Geometric interpretation[edit source] See also: Confidence region <span>The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.  Hence the multivariate normal distribution is an example of the class of elliptical distributions. The directions of the principal axes of the ellipsoids are given by the eigenvec

#### Flashcard 1729672908044

Tags
#multivariate-normal-distribution
Question
If Y = c + BX is an affine transformation,
then Y has a multivariate normal distribution with expected value [...]
c +

Corollaries: sums of Gaussian are Gaussian, marginals of Gaussian are Gaussian.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
If Y = c + BX is an affine transformation of where c is an vector of constants and B is a constant matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance BΣB T . Corollaries: sums of Gaussian are Gaussian, marginals of Gaussian are Gaussian.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
{\displaystyle {\boldsymbol {\Sigma }}'={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{13}\\{\boldsymbol {\Sigma }}_{31}&{\boldsymbol {\Sigma }}_{33}\end{bmatrix}}} . Affine transformation[edit source] <span>If Y = c + BX is an affine transformation of X ∼ N ( μ , Σ ) , {\displaystyle \mathbf {X} \ \sim {\mathcal {N}}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}),} where c is an M × 1 {\displaystyle M\times 1} vector of constants and B is a constant M × N {\displaystyle M\times N} matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance BΣB T i.e., Y ∼ N ( c + B μ , B Σ B T ) {\displaystyle \mathbf {Y} \sim {\mathcal {N}}\left(\mathbf {c} +\mathbf {B} {\boldsymbol {\mu }},\mathbf {B} {\boldsymbol {\Sigma }}\mathbf {B} ^{\rm {T}}\right)} . In particular, any subset of the X i has a marginal distribution that is also multivariate normal. To see this, consider the following example: to extract the subset (X 1 , X 2 , X 4 )

#### Flashcard 1729674480908

Tags
#multivariate-normal-distribution
Question
If Y = c + BX, then Y has variance [...]
BΣBT

Corollaries: sums of Gaussian are Gaussian, marginals of Gaussian are Gaussian.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
><head> If Y = c + BX is an affine transformation of where c is an vector of constants and B is a constant matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance BΣB T . Corollaries: sums of Gaussian are Gaussian, marginals of Gaussian are Gaussian. <html>

#### Original toplevel document

Multivariate normal distribution - Wikipedia
{\displaystyle {\boldsymbol {\Sigma }}'={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{13}\\{\boldsymbol {\Sigma }}_{31}&{\boldsymbol {\Sigma }}_{33}\end{bmatrix}}} . Affine transformation[edit source] <span>If Y = c + BX is an affine transformation of X ∼ N ( μ , Σ ) , {\displaystyle \mathbf {X} \ \sim {\mathcal {N}}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}),} where c is an M × 1 {\displaystyle M\times 1} vector of constants and B is a constant M × N {\displaystyle M\times N} matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance BΣB T i.e., Y ∼ N ( c + B μ , B Σ B T ) {\displaystyle \mathbf {Y} \sim {\mathcal {N}}\left(\mathbf {c} +\mathbf {B} {\boldsymbol {\mu }},\mathbf {B} {\boldsymbol {\Sigma }}\mathbf {B} ^{\rm {T}}\right)} . In particular, any subset of the X i has a marginal distribution that is also multivariate normal. To see this, consider the following example: to extract the subset (X 1 , X 2 , X 4 )

#### Flashcard 1729676053772

Tags
#multivariate-normal-distribution
Question

To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to [...]

drop the irrelevant variables

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions an

#### Original toplevel document

Multivariate normal distribution - Wikipedia
) {\displaystyle \operatorname {E} (X_{1}\mid X_{2}##BAD TAG##\rho E(X_{2}\mid X_{2}##BAD TAG##} and then using the properties of the expectation of a truncated normal distribution. Marginal distributions[edit source] <span>To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra.  Example Let X = [X 1 , X 2 , X 3 ] be multivariate normal random variables with mean vector μ = [μ 1 , μ 2 , μ 3 ] and covariance matrix Σ (standard parametrization for multivariate

#### Annotation 1729678413068

#multivariate-normal-distribution

the distribution of x1 conditional on x2 = a is multivariate normal (x1 | x2 = a) ~ N( μ , Σ ) where and covariance matrix #### Parent (intermediate) annotation

Open it
Conditional distributions If N-dimensional x is partitioned as follows and accordingly μ and Σ are partitioned as follows then the distribution of x 1 conditional on x 2 = a is multivariate normal (x 1 | x 2 = a) ~ N( μ , Σ ) where and covariance matrix This matrix is the Schur complement of Σ 22 in Σ. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops t

#### Original toplevel document

Multivariate normal distribution - Wikipedia
y two or more of its components that are pairwise independent are independent. But, as pointed out just above, it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. <span>Conditional distributions[edit source] If N-dimensional x is partitioned as follows x = [ x 1 x 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}} and accordingly μ and Σ are partitioned as follows μ = [ μ 1 μ 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\displaystyle {\boldsymbol {\mu }}={\begin{bmatrix}{\boldsymbol {\mu }}_{1}\\{\boldsymbol {\mu }}_{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}} Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] with sizes [ q × q q × ( N − q ) ( N − q ) × q ( N − q ) × ( N − q ) ] {\displaystyle {\boldsymbol {\Sigma }}={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{12}\\{\boldsymbol {\Sigma }}_{21}&{\boldsymbol {\Sigma }}_{22}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times q&q\times (N-q)\\(N-q)\times q&(N-q)\times (N-q)\end{bmatrix}}} then the distribution of x 1 conditional on x 2 = a is multivariate normal (x 1 | x 2 = a) ~ N(μ, Σ) where μ ¯ = μ 1 + Σ 12 Σ 22 − 1 ( a − μ 2 ) {\displaystyle {\bar {\boldsymbol {\mu }}}={\boldsymbol {\mu }}_{1}+{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} and covariance matrix Σ ¯ = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 . {\displaystyle {\overline {\boldsymbol {\Sigma }}}={\boldsymbol {\Sigma }}_{11}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}{\boldsymbol {\Sigma }}_{21}.}  This matrix is the Schur complement of Σ 22 in Σ. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional covariance matrix. Here Σ 22 − 1 {\displaystyle {\boldsymbol {\Sigma }}_{22}^{-1}} is the generalized inverse of Σ 22 {\displaystyle {\boldsymbol {\Sigma }}_{22}} . Note that knowing that x 2 = a alters the variance, though the new variance does not depend on the specific value of a; perhaps more surprisingly, the mean is shifted by Σ 12 Σ 22 − 1 ( a − μ 2 ) {\displaystyle {\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} ; compare this with the situation of not knowing the value of a, in which case x 1 would have distribution N q ( μ 1 , Σ 11 ) {\displaystyle {\mathcal {N}}_{q}\left({\boldsymbol {\mu }}_{1},{\boldsymbol {\Sigma }}_{11}\right)} . An interesting fact derived in order to prove this result, is that the random vectors x 2 {\displaystyle \mathbf {x} _{2}} and y 1 = x 1 − Σ 12 Σ 22 − 1 x 2 {\displaystyle \mathbf {y} _{1}=\mathbf {x} _{1}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\mathbf {x} _{2}} are independent. The matrix Σ 12 Σ 22 −1 is known as the matrix of regression coefficients. Bivariate case[edit source] In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is 

#### Flashcard 1729680772364

Tags
#multivariate-normal-distribution
Question

the distribution of x1 conditional on x2 = a is multivariate normal (x1 | x2 = a) ~ N( μ , Σ ) where μ [...] and covariance matrix Σ [...]  status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
the distribution of x 1 conditional on x 2 = a is multivariate normal (x 1 | x 2 = a) ~ N( μ , Σ ) where and covariance matrix

#### Original toplevel document

Multivariate normal distribution - Wikipedia
y two or more of its components that are pairwise independent are independent. But, as pointed out just above, it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. <span>Conditional distributions[edit source] If N-dimensional x is partitioned as follows x = [ x 1 x 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}} and accordingly μ and Σ are partitioned as follows μ = [ μ 1 μ 2 ] with sizes [ q × 1 ( N − q ) × 1 ] {\displaystyle {\boldsymbol {\mu }}={\begin{bmatrix}{\boldsymbol {\mu }}_{1}\\{\boldsymbol {\mu }}_{2}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(N-q)\times 1\end{bmatrix}}} Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] with sizes [ q × q q × ( N − q ) ( N − q ) × q ( N − q ) × ( N − q ) ] {\displaystyle {\boldsymbol {\Sigma }}={\begin{bmatrix}{\boldsymbol {\Sigma }}_{11}&{\boldsymbol {\Sigma }}_{12}\\{\boldsymbol {\Sigma }}_{21}&{\boldsymbol {\Sigma }}_{22}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times q&q\times (N-q)\\(N-q)\times q&(N-q)\times (N-q)\end{bmatrix}}} then the distribution of x 1 conditional on x 2 = a is multivariate normal (x 1 | x 2 = a) ~ N(μ, Σ) where μ ¯ = μ 1 + Σ 12 Σ 22 − 1 ( a − μ 2 ) {\displaystyle {\bar {\boldsymbol {\mu }}}={\boldsymbol {\mu }}_{1}+{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} and covariance matrix Σ ¯ = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 . {\displaystyle {\overline {\boldsymbol {\Sigma }}}={\boldsymbol {\Sigma }}_{11}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}{\boldsymbol {\Sigma }}_{21}.}  This matrix is the Schur complement of Σ 22 in Σ. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional covariance matrix. Here Σ 22 − 1 {\displaystyle {\boldsymbol {\Sigma }}_{22}^{-1}} is the generalized inverse of Σ 22 {\displaystyle {\boldsymbol {\Sigma }}_{22}} . Note that knowing that x 2 = a alters the variance, though the new variance does not depend on the specific value of a; perhaps more surprisingly, the mean is shifted by Σ 12 Σ 22 − 1 ( a − μ 2 ) {\displaystyle {\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\left(\mathbf {a} -{\boldsymbol {\mu }}_{2}\right)} ; compare this with the situation of not knowing the value of a, in which case x 1 would have distribution N q ( μ 1 , Σ 11 ) {\displaystyle {\mathcal {N}}_{q}\left({\boldsymbol {\mu }}_{1},{\boldsymbol {\Sigma }}_{11}\right)} . An interesting fact derived in order to prove this result, is that the random vectors x 2 {\displaystyle \mathbf {x} _{2}} and y 1 = x 1 − Σ 12 Σ 22 − 1 x 2 {\displaystyle \mathbf {y} _{1}=\mathbf {x} _{1}-{\boldsymbol {\Sigma }}_{12}{\boldsymbol {\Sigma }}_{22}^{-1}\mathbf {x} _{2}} are independent. The matrix Σ 12 Σ 22 −1 is known as the matrix of regression coefficients. Bivariate case[edit source] In the bivariate case where x is partitioned into X 1 and X 2 , the conditional distribution of X 1 given X 2 is 

#### Flashcard 1729683393804

Tags
#matrix-inversion
Question

The pseudoinverse can be computed using [...].

singular value decomposition

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.

#### Original toplevel document

Moore–Penrose inverse - Wikipedia
tion (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra. <span>The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition. Contents [hide] 1 Notation 2 Definition 3 Properties 3.1 Existence and uniqueness 3.2 Basic properties 3.2.1 Identities 3.3 Reduction to Hermitian case 3.4 Products 3.5

#### Flashcard 1729684966668

Tags
#matrix-inversion
Question
A common use of the pseudoinverse is to compute a [...] to a system of linear equations that lacks a unique solution
'best fit' (least squares) solution

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution

#### Original toplevel document

Moore–Penrose inverse - Wikipedia
tegral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse. <span>A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the

#### Flashcard 1729687325964

Tags
#linear-algebra
Question
Formally, given a matrix and a matrix , A is a generalized inverse of if it satisfies the condition [...] .

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Formally, given a matrix and a matrix , A is a generalized inverse of if it satisfies the condition : .

#### Original toplevel document

Generalized inverse - Wikipedia
nverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix A {\displaystyle A} . <span>Formally, given a matrix A ∈ R n × m {\displaystyle A\in \mathbb {R} ^{n\times m}} and a matrix A g ∈ R m × n {\displaystyle A^{\mathrm {g} }\in \mathbb {R} ^{m\times n}} , A g {\displaystyle A^{\mathrm {g} }} is a generalized inverse of A {\displaystyle A} if it satisfies the condition A A g A = A {\displaystyle AA^{\mathrm {g} }A=A} .    The purpose of constructing a generalized inverse of a matrix is to obtain a matrix that can serve as an inverse in some sense for a wider class of matrices than invertibl

#### Annotation 1729689685260

#linear-algebra
In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them.

#### Parent (intermediate) annotation

Open it
In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup.

#### Original toplevel document

Generalized inverse - Wikipedia
ree encyclopedia Jump to: navigation, search "Pseudoinverse" redirects here. For the Moore–Penrose inverse, sometimes referred to as "the pseudoinverse", see Moore–Penrose inverse. <span>In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix A {\displaystyle A} . Formally, given a matrix A ∈

#### Flashcard 1729691258124

Tags
#linear-algebra
Question
a [...] has some properties of an inverse element but not necessarily all of them.
generalized inverse

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them.

#### Original toplevel document

Generalized inverse - Wikipedia
ree encyclopedia Jump to: navigation, search "Pseudoinverse" redirects here. For the Moore–Penrose inverse, sometimes referred to as "the pseudoinverse", see Moore–Penrose inverse. <span>In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix A {\displaystyle A} . Formally, given a matrix A ∈

#### Flashcard 1729692830988

Tags
#multivariate-normal-distribution
Question

In the bivariate normal case the expression for the mutual information is [...] status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In the bivariate case the expression for the mutual information is:

#### Original toplevel document

Multivariate normal distribution - Wikipedia
ldsymbol {\rho }}_{0}} is the correlation matrix constructed from Σ 0 {\displaystyle {\boldsymbol {\Sigma }}_{0}} . <span>In the bivariate case the expression for the mutual information is: I ( x ; y ) = − 1 2 ln ⁡ ( 1 − ρ 2 ) . {\displaystyle I(x;y)=-{1 \over 2}\ln(1-\rho ^{2}).} Cumulative distribution function[edit source] The notion of cumulative distribution function (cdf) in dimension 1 can be extended in two ways to the multidimensional case, based

#### Flashcard 1729696763148

Tags
#multivariate-normal-distribution
Question
The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which is [...] and is [...]
the full multivariate distribution, the product of the 1-dimensional marginal distributions

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which is the full multivariate distribution and is the product of the 1-dimensional marginal distributions

#### Original toplevel document

Multivariate normal distribution - Wikipedia
al {CN}}_{0}\|{\mathcal {CN}}_{1})=\operatorname {tr} \left({\boldsymbol {\Sigma }}_{1}^{-1}{\boldsymbol {\Sigma }}_{0}\right)-k+\ln {|{\boldsymbol {\Sigma }}_{1}| \over |{\boldsymbol {\Sigma }}_{0}|}.} Mutual information[edit source] <span>The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which P {\displaystyle P} is the full multivariate distribution and Q {\displaystyle Q} is the product of the 1-dimensional marginal distributions. In the notation of the Kullback–Leibler divergence section of this article, Σ 1

#### Flashcard 1729699122444

Tags
#multivariate-normal-distribution
Question
The multivariate normal distribution is often used to describe correlated real-valued random variables each of which [...]
clusters around a mean value

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value

#### Original toplevel document

Multivariate normal distribution - Wikipedia
e definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. <span>The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value. Contents [hide] 1 Notation and parametrization 2 Definition 3 Properties 3.1 Density function 3.1.1 Non-degenerate case 3.1.2 Degenerate case 3.2 Higher moments 3.3 Lik

#### Flashcard 1729700695308

Tags
#multivariate-normal-distribution
Question
a random vector is said to be k-variate normally distributed if [...] has a univariate normal distribution.
every linear combination of its k components

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution.

#### Original toplevel document

Multivariate normal distribution - Wikipedia
a }}\mathbf {t} {\Big )}} In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. <span>One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly)

#### Annotation 1729702268172

#probability
The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution.

#### Parent (intermediate) annotation

Open it
The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p)

#### Original toplevel document

Negative binomial distribution - Wikipedia
) . {\displaystyle \operatorname {Poisson} (\lambda )=\lim _{r\to \infty }\operatorname {NB} \left(r,{\frac {\lambda }{\lambda +r}}\right).} Gamma–Poisson mixture[edit source] <span>The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p. To display the intuition behind this statement, consider two independent Poisson processes, “Success” and “Failure”, with intensities p and 1 − p. Together, the Success and Failure pr

#### Flashcard 1729703841036

Tags
#probability
Question
The negative binomial distribution also arises as a [...] of Poisson distributions
continuous mixture

Can be used to model over dispersed count observations, known as Gamma-Poisson distribution.

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution.

#### Original toplevel document

Negative binomial distribution - Wikipedia
) . {\displaystyle \operatorname {Poisson} (\lambda )=\lim _{r\to \infty }\operatorname {NB} \left(r,{\frac {\lambda }{\lambda +r}}\right).} Gamma–Poisson mixture[edit source] <span>The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p. To display the intuition behind this statement, consider two independent Poisson processes, “Success” and “Failure”, with intensities p and 1 − p. Together, the Success and Failure pr

#### Flashcard 1729705413900

Tags
#probability
Question
[...] also arises as a continuous mixture as Gamma-Poisson distributions
The negative binomial distribution

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. </spa

#### Original toplevel document

Negative binomial distribution - Wikipedia
) . {\displaystyle \operatorname {Poisson} (\lambda )=\lim _{r\to \infty }\operatorname {NB} \left(r,{\frac {\lambda }{\lambda +r}}\right).} Gamma–Poisson mixture[edit source] <span>The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p. To display the intuition behind this statement, consider two independent Poisson processes, “Success” and “Failure”, with intensities p and 1 − p. Together, the Success and Failure pr

#### Flashcard 1729706986764

Tags
#probability
Question
[...] distribution is the number of successes in a sequence of iid Bernoulli trials before a specified number of failures (denoted r) occurs.
negative binomial distribution

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of fai

#### Original toplevel document

Negative binomial distribution - Wikipedia
) 2 ( p ) {\displaystyle {\frac {r}{(1-p)^{2}(p)}}} <span>In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs. For example, if we define a 1 as failure, all non-1s as successes, and we throw a dice repeatedly until the third time 1 appears (r = three failures), then the probability distribution

#### Annotation 1729709870348

#singular-value-decomposition
In linear algebra, the singular-value decomposition (SVD) generalises the eigendecomposition of a positive semidefinite normal matrix (for example, a symmetric matrix with positive eigenvalues) to any matrix via an extension of the polar decomposition.

Singular-value decomposition - Wikipedia
nto three simple transformations: an initial rotation V ∗ , a scaling Σ along the coordinate axes, and a final rotation U. The lengths σ 1 and σ 2 of the semi-axes of the ellipse are the singular values of M, namely Σ 1,1 and Σ 2,2 . <span>In linear algebra, the singular-value decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix (for example, a symmetric matrix with positive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. Formally, the singular-value decomposition of an m × n {\d

#### Annotation 1729711967500

#singular-value-decomposition
Formally, the singular-value decomposition of an real or complex matrix is a factorization of the form , where is an real or complex unitary matrix, is a rectangular diagonal matrix with non-negative real numbers on the diagonal, and is an real or complex unitary matrix.

Singular-value decomposition - Wikipedia
ositive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. <span>Formally, the singular-value decomposition of an m × n {\displaystyle m\times n} real or complex matrix M {\displaystyle \mathbf {M} } is a factorization of the form U Σ V ∗ {\displaystyle \mathbf {U\Sigma V^{*}} } , where U {\displaystyle \mathbf {U} } is an m × m {\displaystyle m\times m} real or complex unitary matrix, Σ {\displaystyle \mathbf {\Sigma } } is a m × n {\displaystyle m\times n} rectangular diagonal matrix with non-negative real numbers on the diagonal, and V {\displaystyle \mathbf {V} } is an n × n {\displaystyle n\times n} real or complex unitary matrix. The diagonal entries σ i {\displaystyle \sigma _{i}} of

#### Flashcard 1729714326796

Tags
#singular-value-decomposition
Question
[...] generalises eigendecomposition of a positive semidefinite normal matrix to any matrix
singular-value decomposition (SVD)

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In linear algebra, the singular-value decomposition (SVD) generalises the eigendecomposition of a positive semidefinite normal matrix (for example, a symmetric matrix with positive eigenvalues) to any matrix via an extension of the polar deco

#### Original toplevel document

Singular-value decomposition - Wikipedia
nto three simple transformations: an initial rotation V ∗ , a scaling Σ along the coordinate axes, and a final rotation U. The lengths σ 1 and σ 2 of the semi-axes of the ellipse are the singular values of M, namely Σ 1,1 and Σ 2,2 . <span>In linear algebra, the singular-value decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix (for example, a symmetric matrix with positive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. Formally, the singular-value decomposition of an m × n {\d

#### Annotation 1729716686092

#singular-value-decomposition
Formally, the singular-value decomposition of an real or complex matrix is a factorization of the form #### Parent (intermediate) annotation

Open it
Formally, the singular-value decomposition of an real or complex matrix is a factorization of the form , where is an real or complex unitary matrix, is a rectangular diagonal matrix with non-negative real numbers on the diagonal, and is an real or complex unitary matrix.

#### Original toplevel document

Singular-value decomposition - Wikipedia
ositive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. <span>Formally, the singular-value decomposition of an m × n {\displaystyle m\times n} real or complex matrix M {\displaystyle \mathbf {M} } is a factorization of the form U Σ V ∗ {\displaystyle \mathbf {U\Sigma V^{*}} } , where U {\displaystyle \mathbf {U} } is an m × m {\displaystyle m\times m} real or complex unitary matrix, Σ {\displaystyle \mathbf {\Sigma } } is a m × n {\displaystyle m\times n} rectangular diagonal matrix with non-negative real numbers on the diagonal, and V {\displaystyle \mathbf {V} } is an n × n {\displaystyle n\times n} real or complex unitary matrix. The diagonal entries σ i {\displaystyle \sigma _{i}} of

#### Flashcard 1729718258956

Tags
#singular-value-decomposition
Question
singular-value decomposition factorises an matrix M to the form [...] status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Formally, the singular-value decomposition of an real or complex matrix is a factorization of the form

#### Original toplevel document

Singular-value decomposition - Wikipedia
ositive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. <span>Formally, the singular-value decomposition of an m × n {\displaystyle m\times n} real or complex matrix M {\displaystyle \mathbf {M} } is a factorization of the form U Σ V ∗ {\displaystyle \mathbf {U\Sigma V^{*}} } , where U {\displaystyle \mathbf {U} } is an m × m {\displaystyle m\times m} real or complex unitary matrix, Σ {\displaystyle \mathbf {\Sigma } } is a m × n {\displaystyle m\times n} rectangular diagonal matrix with non-negative real numbers on the diagonal, and V {\displaystyle \mathbf {V} } is an n × n {\displaystyle n\times n} real or complex unitary matrix. The diagonal entries σ i {\displaystyle \sigma _{i}} of

#### Flashcard 1729720618252

Tags
#singular-value-decomposition
Question
With a factorization of the form , , , represent [...] real or complex unitary matrix real or complex unitary matrix. rectangular diagonal matrix

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Formally, the singular-value decomposition of an real or complex matrix is a factorization of the form , where is an real or complex unitary matrix, is a rectangular diagonal matrix with non-negative real numbers on the diagonal, and is an real or complex unitary matrix.

#### Original toplevel document

Singular-value decomposition - Wikipedia
ositive eigenvalues) to any m × n {\displaystyle m\times n} matrix via an extension of the polar decomposition. It has many useful applications in signal processing and statistics. <span>Formally, the singular-value decomposition of an m × n {\displaystyle m\times n} real or complex matrix M {\displaystyle \mathbf {M} } is a factorization of the form U Σ V ∗ {\displaystyle \mathbf {U\Sigma V^{*}} } , where U {\displaystyle \mathbf {U} } is an m × m {\displaystyle m\times m} real or complex unitary matrix, Σ {\displaystyle \mathbf {\Sigma } } is a m × n {\displaystyle m\times n} rectangular diagonal matrix with non-negative real numbers on the diagonal, and V {\displaystyle \mathbf {V} } is an n × n {\displaystyle n\times n} real or complex unitary matrix. The diagonal entries σ i {\displaystyle \sigma _{i}} of

#### Annotation 1729724550412

#quantecon

OOP is about producing well organized code — an important determinant of productivity