# on 03-Sep-2019 (Tue)

#### Annotation 4357059120396

The conventional Euclidean norm is not appropriate because, as discussed in Section 9.2, some states are more important than ot he rs because they occur more frequently or because we are more interested in them ( S ect i on 9.11)

#### pdf

cannot see any pdfs

#### Annotation 4361765129484

#computer-science #machine-learning #reinforcement-learning
Among the algorithms investigated so far in this book, only the Monte Carlo methods are true SGD methods. These methods converge robustly under both on-policy and off-policy training as well as for general nonlinear (differentiable) function approximators, though they are often slower than semi-gradient methods with bootstrapping, which are not SGD methods.

#### pdf

cannot see any pdfs

#### Annotation 4362122431756

#reinforcement-learning

We consider a broader class of Bellman equations that are non-linear in the rewards and future values: $$v(s)=\mathbb{E}\left[f\left(R_{t+1}, v\left(S_{t+1}\right)\right) | S_{t}=s, A_{t} \sim \pi\left(S_{t}\right)\right]$$ .

#### pdf

cannot see any pdfs

#### Annotation 4362130558220

#reinforcement-learning
Humans and animals seem to exhibit a different type of weighting of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.

#### pdf

cannot see any pdfs

#### Annotation 4362345778444

#reinforcement-learning
As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition $$s \stackrel{a}{\rightarrow} s^{\prime}$$ is given by $$\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}$$, where $$\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}$$ are features of (s, a, s') and $$\mathbf{w} \in \mathbb{R}^{d}$$ are weights.

#### pdf

cannot see any pdfs

#### Annotation 4362357050636

#reinforcement-learning
SFs allow one to immediately compute the value of a policy π on any task w : it is easy to show that, when (1) holds, Q π w (s, a) = ψ π (s, a) > w . It is also easy to see that SFs satisfy a Bellman equation in which φ play the role of rewards, so ψ can be learned using any RL method (Szepesv ´ ari, 2010).

#### pdf

cannot see any pdfs

#### Annotation 4362368322828

#reinforcement-learning
As one can see, the types of generalisation provided by UVFAs and SF&GPI are in some sense complementary. It is then natural to ask if we can simultaneously have the two types of generalisation. In this paper we propose a model that provides exactly that. The main insight is actually simple: since SFs are multi-dimensional value functions, we can extend them in the same way as universal value functions extend regular value functions. In the next section we elaborate on how exactly to do so.

#### pdf

cannot see any pdfs

#### pdf

cannot see any pdfs

#### Annotation 4362545007884

The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area of convex optimization that has had a huge impact on understanding and improving the world we live in. However, convexity does not provide all the answers. Many procedures in statistics, machine learning and nature at large—Bayesian inference, deep learning, protein folding—successfully solve non-convex problems that are NP-hard, i.e., intractable on worst-case instances. Moreover, often nature or humans choose methods that are inefficient in the worst case to solve problems in P.

Can we develop a theory to resolve this mismatch between reality and the predictions of worst-case analysis? Such a theory could identify structure in natural inputs that helps sidestep worst-case complexity.

Off the Convex Path – Off the convex path
Off the Convex Path – Off the convex path About Contact Subscribe Off the Convex Path Contributors Sanjeev Arora Moritz Hardt Nisheeth Vishnoi Nadav Cohen Mission statement The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area of convex optimization that has had a huge impact on understanding and improving the world we live in. However, convexity does not provide all the answers. Many procedures in statistics, machine learning and nature at large—Bayesian inference, deep learning, protein folding—successfully solve non-convex problems that are NP-hard, i.e., intractable on worst-case instances. Moreover, often nature or humans choose methods that are inefficient in the worst case to solve problems in P. Can we develop a theory to resolve this mismatch between reality and the predictions of worst-case analysis? Such a theory could identify structure in natural inputs that helps sidestep worst-case complexity. This blog is dedicated to the idea that optimization methods—whether created by humans or nature, whether convex or nonconvex—are exciting objects of study and, often lead to useful alg

#### Annotation 4362547105036

This blog is dedicated to the idea that optimization methods—whether created by humans or nature, whether convex or nonconvex—are exciting objects of study and, often lead to useful algorithms and insights into nature. This study can be seen as an extension of classical mathematical fields such as dynamical systems and differential equations among others, but with the important addition of the notion of computational efficiency.

We will report on interesting research directions and open problems, and highlight progress that has been made. We will write articles ourselves as well as encourage others to contribute. In doing so, we hope to generate an active dialog between theorists, scientists and practitioners and to motivate a generation of young researchers to work on these important problems.

Off the Convex Path – Off the convex path
a theory to resolve this mismatch between reality and the predictions of worst-case analysis? Such a theory could identify structure in natural inputs that helps sidestep worst-case complexity. <span>This blog is dedicated to the idea that optimization methods—whether created by humans or nature, whether convex or nonconvex—are exciting objects of study and, often lead to useful algorithms and insights into nature. This study can be seen as an extension of classical mathematical fields such as dynamical systems and differential equations among others, but with the important addition of the notion of computational efficiency. We will report on interesting research directions and open problems, and highlight progress that has been made. We will write articles ourselves as well as encourage others to contribute. In doing so, we hope to generate an active dialog between theorists, scientists and practitioners and to motivate a generation of young researchers to work on these important problems. Contributing an article If you’re writing an article for this blog, please follow these guidelines. Theme available on Github. sharedunsaved Last Modified: Close[x] R+ Annotation 436254

#### Annotation 4362549988620

Trajectory Analysis: Implicit Regularization Towards Low Rank

We are interested in understanding what end-to-end matrix W W emerges when we run GD on an LNN to minimize a general convex loss L ( W ) L(W) , and in particular the matrix completion loss given above. Note that L ( W ) L(W) is convex, but the objective obtained by over-parameterizing with an LNN is not. We analyze the trajectories of W W , and specifically the dynamics of its singular value decomposition. Denote the singular values by { σ r } r \{ \sigma_r \}_r , and the corresponding left and right singular vectors by { u r } r \{ \mathbf{u}_r \}_r and { v r } r \{ \mathbf{v}_r \}_r respectively.

#### Annotation 4362554182924

Invariant Risk Minimization: An Information Theoretic View

I finally got around to reading this new paper by Arjovsky et al. It debuted on Twitter with a big splash, being decribed as 'beautiful' and 'long awaited' 'gem of a paper'. It almost felt like a new superhero movie or Disney remake just came out.

The paper is, indeed, very well written, and describes a very elegant idea, a practical algorithm, some theory and lots of discussion around how this is related to various bits. Here, I will describe the main idea and then provide an information theoretic view on the same topic.

Invariant Risk Minimization: An Information Theoretic View
Invariant Risk Minimization: An Information Theoretic View inFERENCe posts on machine learning, statistics, opinions on things I'm reading in the space Home July 19, 2019 Invariant Risk Minimization: An Information Theoretic View I finally got around to reading this new paper by Arjovsky et al. It debuted on Twitter with a big splash, being decribed as 'beautiful' and 'long awaited' 'gem of a paper'. It almost felt like a new superhero movie or Disney remake just came out. Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz (2019) Invariant Risk Minimization The paper is, indeed, very well written, and describes a very elegant idea, a practical algorithm, some theory and lots of discussion around how this is related to various bits. Here, I will describe the main idea and then provide an information theoretic view on the same topic. Summary of the approach We would like to learn robust predictors that are based on invariant causal associations between variables, rather than spurious surface correlations that might

#### Annotation 4362556804364

#computer-science #machine-learning #reinforcement-learning
Thus, following the standard SGD approach, one can derive the per-step update based on a sample of this expected value: w t+1 = w t 1 2 ↵r(⇢ t 2 t ) = w t ↵⇢ t t r t = w t + ↵⇢ t t rˆv(S t ,w t ) rˆv(S t+1 ,w t ) , (11.23) which you will recognize as the same as the semi-gradient TD algorithm (11.2) except f or the additional final term.

#### pdf

cannot see any pdfs

#### Annotation 4362558377228

#computer-science #machine-learning #reinforcement-learning
In the general function-approximation case, the one-step TD error with discounting is t = R t+1 + ˆv(S t+1 ,w t ) ˆv(S t ,w t ). A possible objective function then is what one might call the Mean Squared TD Error: TDE(w)= X s2S µ(s)E ⇥ 2 t S t =s, A t ⇠⇡ ⇤

#### pdf

cannot see any pdfs

#### Annotation 4362561785100

#computer-science #machine-learning #reinforcement-learning
The Bellman error for a stat e is the expected TD error in that state. So let’s repeat the d er i vation above with the expected TD error (all expectations here are implicitly conditional on S t ): w t+1 = w t 1 2 ↵r(E ⇡ [ t ] 2 ) = w t 1 2 ↵r(E b [⇢ t t ] 2 ) = w t ↵E b [⇢ t t ] rE b [⇢ t t ] = w t ↵E b ⇥ ⇢ t (R t+1 + ˆv(S t+1 ,w) ˆv(S t ,w)) ⇤ E b [⇢ t r t ] = w t + ↵ h E b ⇥ ⇢ t (R t+1 + ˆv(S t+1 ,w)) ⇤ ˆv(S t ,w) ih rˆv(S t ,w) E b ⇥ ⇢ t rˆv(S t+1 ,w) ⇤ i . This update and various ways of samp li n g it are referred to as the residual-gradient algorithm.

#### pdf

cannot see any pdfs

#### Annotation 4362563357964

#computer-science #machine-learning #reinforcement-learning
But this is naive, because the equ at i on above involves t h e next state, S t+1 , appearing in two expectations that are multiplied together. To get an unbiased sample of the product, two independent samples of the next state are required, but during normal interaction with an external environment only one is obtaine d.

#### pdf

cannot see any pdfs

#### Annotation 4362564930828

#computer-science #machine-learning #reinforcement-learning
In either of these cases t h e residual-gradient algorithm is guar anteed to converge to a minimum of t h e BE under the usual conditions on the step-size parameter.

#### pdf

cannot see any pdfs

#### Annotation 4362566503692

#computer-science #machine-learning #reinforcement-learning
This example shows intuitively that minimizing th e BE (which the residual-gradient algorithm surely does) may not be a desirable goal.

#### pdf

cannot see any pdfs

#### Annotation 4364202282252

#computer-science #machine-learning #reinforcement-learning
If an objective cannot b e learned, it does in de ed draw its utility into question. In the case of the VE , however, there is a way out. Note that the same solution, w = 1, is optimal for both MRPs above (assuming µ is the same for the two indistinguishable states in the right MRP). Is this a coincidence, or could it be generally t ru e that all MDPs with the same data distribution also have the same optimal parameter vector? If this is true—and we will show next that it is—then the VE remains a usable objective. The VE is not learnable, but the parameter that optimizes it is!

#### pdf

cannot see any pdfs

#### Annotation 4364203855116

#computer-science #machine-learning #reinforcement-learning
One error t h at is always observable is that between the value estimate at each time and the return from that time. The Mean Square Return Error, denoted RE , is the expectation, unde r µ , of the square of this error. In the on-policy case the RE can be written RE(w)=E h G t ˆv(S t ,w) 2 i = VE(w)+E h G t v ⇡ (S t ) 2 i . (11.24)

#### pdf

cannot see any pdfs

#### Annotation 4364205690124

#computer-science #machine-learning #reinforcement-learning
Thus, t he BE is not learnable; it cannot be estimated from featu r e vectors and other observable data. This limits the BE to model-based settings.

#### pdf

cannot see any pdfs

#### Annotation 4364207262988

#computer-science #machine-learning #reinforcement-learning
Figure 11.4: Causal relationships amon g the data distrib u t i o n , MDPs, and various objectives. Left, Monte Carlo objectives: Two di↵erent MDPs can produce the same d a t a distribution yet also produce d i↵erent VE s, proving that t h e VE objective cannot be determined from data and is not learnable. However, all su ch VE s must have the same optimal parameter vector, w ⇤ ! Moreover, this same w ⇤ can be determined from another object ive, the RE ,whichis uniquely determined from the data distribution. Thus w ⇤ and the RE are learnable even though the VE s are not. Right, Bootstrapping objectives: Two di↵erent MDPs can produce the same data distribution yet al so produce di↵ere nt BE s and have di↵erent minimizing parameter vectors; these are not learn a b le from the data distribution. The PBE and TDE objectives and their (di↵erent) minima can be directly determined from data and thus are learnable.

#### pdf

cannot see any pdfs

Article 4364208835852

Machine Learning
#has-images #learning #machine #statistics

#### Annotation 4365471059212

performance anxiety is not a single global undifferentiated response. There are actually three separate components. There’s a (1) physical component, a (2) mental component, and an (3) emotional component.

#### pdf

cannot see any pdfs

#### Annotation 4365473156364

as it turns out, the psychological response (or cognitive anxiety as sport psychologists call it), is more predictive of performance quality than the physical response

#### pdf

cannot see any pdfs

#### Annotation 4365474729228

Each of us has a unique individually optimal zone of optimal functioning, where we will tend to have our best performances.

#### pdf

cannot see any pdfs

#### Annotation 4365476302092

Studies of athletes ranging from track and field to pistol shooting find that most individuals do their best at a mid to high range of anxiety (or activation, as sport psychologists call it).

#### pdf

cannot see any pdfs

#### Annotation 4365477874956

The tiny blood vessels which get blood to your toes, fingers, ears and nose constrict, forcing additional oxygen to your major organs and reducing the risk of blood loss in case a limb were to get chopped off.!Of course, this leaves you with cold hands and feet (and perhaps a cold nose and ears as well), along with numbness and tingling.

#### pdf

cannot see any pdfs

#### Annotation 4365479447820

Your blood pressure is elevated as your body works harder to circulate blood through your body.!Heat builds up in those areas where major organs are being primed for action—the head, chest and stomach.!Frequently, the body must begin sweating in order not to overheat, and you can often feel the heart pumping more quickly than normal as it maintains the higher blood pressure.! Of course, you can’t just sweat in some places, you sweat everywhere, so your palms can get sweaty, leaving you with cold, sticky, clammy hands

#### pdf

cannot see any pdfs

#### Annotation 4365481807116

Then, your non-essential processes go offline. Digesting food is deemed a low priority under stress, so your digestive system takes a back seat. So you may get butterflies in your stomach as the food sits around waiting to be digested, and dry mouth, as salivary production has stopped.

#### pdf

cannot see any pdfs

#### Annotation 4365483379980

In a dangerous situation you need more accurate and complete information about your surroundings, which your body gets through heightened hearing and broader, long-distance vision. Many people also become more sensitized to motion.!Of course, none of this is helping you focus on the task at hand. You may just be keenly aware of every little frown or look from the audience, and get spooked by things happening off to the side.!

#### pdf

cannot see any pdfs

#### Annotation 4365484952844

Like a cat about to pounce, your body is prepared to spring into action with a burst of energy.

#### pdf

cannot see any pdfs

#### Annotation 4365486525708

One of the most noticeable things that happens is the change in brain wave frequency.!We begin to produce more beta waves, oscillating at 13-30 cycles per second, rather than the slower 8-12 cycle per second alpha waves that seem to be more conducive to high-level performance.

#### pdf

cannot see any pdfs

#### Annotation 4365488098572

And our brain is suddenly way more attuned to threats, danger, things that could go wrong, all the negative stuff we don’t want to happen.

#### pdf

cannot see any pdfs

#### Annotation 4365489671436

we start worrying and try to exert too much conscious control over our playing.

#### pdf

cannot see any pdfs

#### Annotation 4365491244300

when we begin thinking too much about technique and attempt to exercise too much deliberate control over our muscles, we shift control back to the cerebral cortex and disrupt the cerebellum’s ability to run off these motor programs automatically, leading to mistakes.

#### pdf

cannot see any pdfs

#### Annotation 4365492817164

what were the two groups of golfers thinking about before and during their shot? What sorts of thoughts caused these differences? The worst performers engaged primarily in verbal, analytical self- coaching dialogue (e.g. “keep your head still”). The best performers reported that it was pretty simple. They were focused on the target (an image) and the general feel (kinesthetic) of a successfully executed shot.

#### pdf

cannot see any pdfs

#### Annotation 4365494390028

So, if you put the physical and mental effects together, now we find ourselves in a less than ideal emotional state where we are feeling anxiety, dread, fear, worry, even panic perhaps. And when we feel this way, instinct usually takes over. Now, instincts are usually good, in that instinct keeps us away from danger and can save our lives, but not in this case. In this case instincts tend to make us do pretty much the opposite of what would help us play our best. We tighten up, we overthink, we focus on the wrong things, and we play tentatively, carefully, cautiously.

#### pdf

cannot see any pdfs

#### Annotation 4365495962892

in uncertain situations we tend to have a defensive, careful, cautious mindset that is akin to playing not to lose, instead of playing to win. Not only does this not represent our best playing, but paradoxically it makes us more anxious, and more likely to make mistakes

#### pdf

cannot see any pdfs

#### Annotation 4365497535756

“You may not be able to stop the waves...but you can learn to surf.”

#### pdf

cannot see any pdfs

#### Annotation 4365499108620

The 7 steps of “Centering”

#### pdf

cannot see any pdfs

#### Annotation 4365500681484

The idea is to select a fixed point in the distance, somewhere that feels comfortable to rest your eyes. This point could be on your stand, the ground in front of you, or on the back row of the hall.

#### pdf

cannot see any pdfs

#### Annotation 4365502254348

So change “don’t miss the high note” to “nail the high note”. Change “Uh oh, I hope my bow doesn’t shake here” to “Easy, fluid bow changes”.

#### pdf

cannot see any pdfs

#### Annotation 4365503827212

you can hear Leon Fleisher talking about clear intentions in his own words.

#### pdf

cannot see any pdfs

#### Annotation 4365505400076

One of the most effective techniques for toning down the stress response involves learning how to breathe diaphragmatically. When stressed, our bodies have a tendency to revert to shallow, rapid, chest breathing. Doing so keeps us in the so-called fight-or-flight mode. Aside from being the most biomechanically efficient way to breathe, deep belly breathing allows us to activate the parasympathetic nervous system response (aka “rest and digest” mode), which is our body’s antidote for the fight-or-flight state.

#### pdf

cannot see any pdfs

#### Annotation 4365506972940

Interestingly, if you want to get your energy kicked up a notch, just reverse it, meaning, take in a regular length breath through your nose, and exhale quickly and forcefully though your mouth.

#### pdf

cannot see any pdfs

#### Annotation 4365508545804

the key to technical consistency and maximum performance from a physical standpoint often goes back to one’s ability to keep key muscles loose and free of excess tension.

#### pdf

cannot see any pdfs

#### Annotation 4365510118668

Progressive muscle relaxation (PMR)

#### pdf

cannot see any pdfs

#### pdf

cannot see any pdfs

#### Annotation 4369495493900

#learning #machine #statistics

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.[1][2]:2 Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task.

Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning.[3][4] In its application across business problems, machine learning is also referred to as predictive analytics.