BuboFlash - helps with learning

Edited, memorised or added to reading queue

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 4395227024652

Tags

#regression #statistics #supervised-learning

Question

How does locally weighted regression work? Think of the locally weighted linear regression cost function and setup.

Answer

In contrast, the locally weighted linear regression algorithm does the following:

1. Fit θ to minimize \(\sum_{i} w^{(i)}\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}\) . Where here \(w^{(i)}\) is a weight function of a training example \(x^{(i)}\text{'s}\) distance to the query point \(x\) , e.g. \(w^{(i)}=\exp \left(-\frac{\left(x^{(i)}-x\right)^{2}}{2 \tau^{2}}\right)\)

2. Output \(\theta^Tx\)

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Annotation 4395229383948

#anki #learning #spaced_repitition

I began with the AlphaGo paper itself. I began reading it quickly, almost skimming. I wasn't looking for a comprehensive understanding. Rather, I was doing two things. One, I was trying to simply identify the most important ideas in the paper. What were the names of the key techniques I'd need to learn about? Second, there was a kind of hoovering process, looking for basic facts that I could understand easily, and that would obviously benefit me. Things like basic terminology, the rules of Go, and so on.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
ing. I was going to need to learn this material from scratch, and to write a good article I was going to need to really understand the underlying technical material. Here's how I went about it. <span>I began with the AlphaGo paper itself. I began reading it quickly, almost skimming. I wasn't looking for a comprehensive understanding. Rather, I was doing two things. One, I was trying to simply identify the most important ideas in the paper. What were the names of the key techniques I'd need to learn about? Second, there was a kind of hoovering process, looking for basic facts that I could understand easily, and that would obviously benefit me. Things like basic terminology, the rules of Go, and so on. Here's a few examples of the kind of question I entered into Anki at this stage: “What's the size of a Go board?”; “Who plays first in Go?”; “How many human game positions did AlphaGo l

Annotation 4395231481100

#anki #learning #spaced_repitition

By contrast, had I used conventional note-taking in my original reading of the AlphaGo paper, my understanding would have more rapidly evaporated, and it would have taken longer to read the later papers. And so using Anki in this way gives confidence you will retain understanding over the long term. This confidence, in turn, makes the initial act of understanding more pleasurable, since you believe you're learning something for the long haul, not something you'll forget in a day or a week.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
stand those papers as thoroughly as the initial AlphaGo paper, I found I could get a pretty good understanding of the papers in less than an hour. I'd retained much of my earlier understanding! <span>By contrast, had I used conventional note-taking in my original reading of the AlphaGo paper, my understanding would have more rapidly evaporated, and it would have taken longer to read the later papers. And so using Anki in this way gives confidence you will retain understanding over the long term. This confidence, in turn, makes the initial act of understanding more pleasurable, since you believe you're learning something for the long haul, not something you'll forget in a day or a week. OK, but what does one do with it? … [N]ow that I have all this power – a mechanical golem that will never forget and never let me forget whatever I chose to – what do I choose to rememb

Annotation 4395233053964

#anki #learning #spaced_repitition

This doesn't mean reading every word in the paper. Rather, I'll add to Anki questions about the core claims, core questions, and core ideas of the paper. It's particularly helpful to extract Anki questions from the abstract, introduction, conclusion, figures, and figure captions. Typically I will extract anywhere from 5 to 20 Anki questions from the paper. It's usually a bad idea to extract fewer than 5 questions – doing so tends to leave the paper as a kind of isolated orphan in my memory. Later I find it difficult to feel much connection to those questions. Put another way: if a paper is so uninteresting that it's not possible to add 5 good questions about it, it's usually better to add no questions at all.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
es assessing it. Does the article seem likely to contain substantial insight or provocation relevant to my project – new questions, new ideas, new methods, new results? If so, I'll have a read. <span>This doesn't mean reading every word in the paper. Rather, I'll add to Anki questions about the core claims, core questions, and core ideas of the paper. It's particularly helpful to extract Anki questions from the abstract, introduction, conclusion, figures, and figure captions. Typically I will extract anywhere from 5 to 20 Anki questions from the paper. It's usually a bad idea to extract fewer than 5 questions – doing so tends to leave the paper as a kind of isolated orphan in my memory. Later I find it difficult to feel much connection to those questions. Put another way: if a paper is so uninteresting that it's not possible to add 5 good questions about it, it's usually better to add no questions at all. One failure mode of this process is if you Ankify** I.e., enter into Anki. Also useful are forms such as Ankification etc. misleading work. Many papers contain wrong or misleading state

Annotation 4395234626828

#anki #learning #spaced_repitition

I said above that I typically spend 10 to 60 minutes Ankifying a paper, with the duration depending on my judgment of the value I'm getting from the paper. However, if I'm learning a great deal, and finding it interesting, I keep reading and Ankifying. Really good resources are worth investing time in. But most papers don't fit this pattern, and you quickly saturate. If you feel you could easily find something more rewarding to read, switch over. It's worth deliberately practicing such switches, to avoid building a counter-productive habit of completionism in your reading. It's nearly always possible to read deeper into a paper, but that doesn't mean you can't easily be getting more value elsewhere. It's a failure mode to spend too long reading unimportant papers.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
e commitment associated to such questions. But I do find the broad shape of the graph fascinating, and it's also useful to know the graph exists, and where to consult it if I want more details. <span>I said above that I typically spend 10 to 60 minutes Ankifying a paper, with the duration depending on my judgment of the value I'm getting from the paper. However, if I'm learning a great deal, and finding it interesting, I keep reading and Ankifying. Really good resources are worth investing time in. But most papers don't fit this pattern, and you quickly saturate. If you feel you could easily find something more rewarding to read, switch over. It's worth deliberately practicing such switches, to avoid building a counter-productive habit of completionism in your reading. It's nearly always possible to read deeper into a paper, but that doesn't mean you can't easily be getting more value elsewhere. It's a failure mode to spend too long reading unimportant papers. Syntopic reading using Anki I've talked about how to use Anki to do shallow reads of papers, and rather deeper reads of papers. There's also a sense in which it's possible to use Anki n

Annotation 4395236199692

#anki #learning #spaced_repitition

You might suppose the foundation would be a shallow read of a large number of papers. In fact, to really grok an unfamiliar field, you need to engage deeply with key papers – papers like the AlphaGo paper. What you get from deep engagement with important papers is more significant than any single fact or technique: you get a sense for what a powerful result in the field looks like. It helps you imbibe the healthiest norms and standards of the field. It helps you internalize how to ask good questions in the field, and how to put techniques together. You begin to understand what made something like AlphaGo a breakthrough – and also its limitations, and the sense in which it was really a natural evolution of the field. Such things aren't captured individually by any single Anki question. But they begin to be captured collectively by the questions one asks when engaged deeply enough with key papers.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
er reads of papers. There's also a sense in which it's possible to use Anki not just to read papers, but to “read” the entire research literature of some field or subfield. Here's how to do it. <span>You might suppose the foundation would be a shallow read of a large number of papers. In fact, to really grok an unfamiliar field, you need to engage deeply with key papers – papers like the AlphaGo paper. What you get from deep engagement with important papers is more significant than any single fact or technique: you get a sense for what a powerful result in the field looks like. It helps you imbibe the healthiest norms and standards of the field. It helps you internalize how to ask good questions in the field, and how to put techniques together. You begin to understand what made something like AlphaGo a breakthrough – and also its limitations, and the sense in which it was really a natural evolution of the field. Such things aren't captured individually by any single Anki question. But they begin to be captured collectively by the questions one asks when engaged deeply enough with key papers. So, to get a picture of an entire field, I usually begin with a truly important paper, ideally a paper establishing a result that got me interested in the field in the first place. I do

Annotation 4395237772556

#anki #learning #spaced_repitition

So, to get a picture of an entire field, I usually begin with a truly important paper, ideally a paper establishing a result that got me interested in the field in the first place. I do a thorough read of that paper, along the lines of what I described for AlphaGo. Later, I do thorough reads of other key papers in the field – ideally, I read the best 5-10 papers in the field. But, interspersed, I also do shallower reads of a much larger number of less important (though still good) papers. In my experimentation so far that means tens of papers, though I expect in some fields I will eventually read hundreds or even thousands of papers in this way.

You may wonder why I don't just focus on only the most important papers. Part of the reason is mundane: it can be hard to tell what the most important papers are. Shallow reads of many papers can help you figure out what the key papers are, without spending too much time doing deeper reads of papers that turn out not to be so important. But there's also a culture that one imbibes reading the bread-and-butter papers of a field: a sense for what routine progress looks like, for the praxis of the field. That's valuable too, especially for building up an overall picture of where the field is at, and to stimulate questions on my own part. Indeed, while I don't recommend spending a large fraction of your time reading bad papers, it's certainly possible to have a good conversation with a bad paper. Stimulus is found in unexpected places.

status	not read		reprioritisations
last reprioritisation on			suggested re-reading day
started reading on			finished reading on

Augmenting Long-term Memory
the field. Such things aren't captured individually by any single Anki question. But they begin to be captured collectively by the questions one asks when engaged deeply enough with key papers. <span>So, to get a picture of an entire field, I usually begin with a truly important paper, ideally a paper establishing a result that got me interested in the field in the first place. I do a thorough read of that paper, along the lines of what I described for AlphaGo. Later, I do thorough reads of other key papers in the field – ideally, I read the best 5-10 papers in the field. But, interspersed, I also do shallower reads of a much larger number of less important (though still good) papers. In my experimentation so far that means tens of papers, though I expect in some fields I will eventually read hundreds or even thousands of papers in this way. You may wonder why I don't just focus on only the most important papers. Part of the reason is mundane: it can be hard to tell what the most important papers are. Shallow reads of many papers can help you figure out what the key papers are, without spending too much time doing deeper reads of papers that turn out not to be so important. But there's also a culture that one imbibes reading the bread-and-butter papers of a field: a sense for what routine progress looks like, for the praxis of the field. That's valuable too, especially for building up an overall picture of where the field is at, and to stimulate questions on my own part. Indeed, while I don't recommend spending a large fraction of your time reading bad papers, it's certainly possible to have a good conversation with a bad paper. Stimulus is found in unexpected places. Over time, this is a form of what Mortimer Adler and Charles van Doren dubbed syntopic reading** In their marvelous “How to Read a Book”: Mortimer J. Adler and Charles van Doren, “How t

Flashcard 4395239345420

Tags

#reinforcement-learning

Question

How does IMPALA achieve stable distributed learning at high throughput?

Answer

by combining decoupled acting and learning with a novel off-policy correction method called V-trace.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395241704716

Tags

#reinforcement-learning

Question

What does the IMPALA architecture allow?

Answer

Distributed acting and learning that uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395244064012

Tags

#reinforcement-learning

Question

In its paper, what was the IMPALA used for?

Answer

Distributing training in the multi-task RL setting.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395246423308

Question

Why does a distributed training architecture such as IMPALA need an off-policy correction?

Answer

Because the policy used to generate a trajectory (a worker's policy) can lag behind the policy on the learner by several updates at the time of gradient calculation.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395248782604

Tags

#reinforcement-learning

Question

Aside from achieving higher data throughputs, what are some other ways in which IMPALA training architecture outperforms A3C?

Answer

Crucially, IMPALA is also more data efficient than A3C based agents and more robust to hyperparameter values and network architectures, allowing it to make better use of deeper neural networks.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395251141900

Tags

#has-images #reinforcement-learning

Question

Reproduce (by visualising mentally or by drawing) the layout of learners and trainers in the IMPALA architecture.

Answer

Figure 1. Left: Single Learner. Each actor generates trajectories and sends them via a queue to the learner. Before starting the next trajectory, actor retrieves the latest policy parameters from learner. Right: Multiple Synchronous Learners. Policy parameters are distributed across multiple learners that work synchronously

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395258219788

Tags

#machine-learning

Question

Describe 2 ways in which stale gradients arise when using distributed training of models.

Answer

Firstly, the read operation (Algo 1 Line 1 ) on a worker may be interleaved with updates by other workers to different parameter servers, so the resultant θ^k may not be consistent with any parameter incarnation θ( t ) . Secondly, model updates may have occurred while a worker is computing its stochastic gradient; hence, the resultant gradients are typically computed with respect to outdated parameters

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

Revisiting Distributed Synchronous SGD – arXiv Vanity
[j]. 3 end for Algorithm 2 Async-SGD Parameter Server j In practice, the updates of Async-Opt are different than those of serially running the stochastic optimization algorithm for two reasons. <span>Firstly, the read operation (Algo 1 Line 1) on a worker may be interleaved with updates by other workers to different parameter servers, so the resultant ˆθk may not be consistent with any parameter incarnation θ(t). Secondly, model updates may have occurred while a worker is computing its stochastic gradient; hence, the resultant gradients are typically computed with respect to outdated parameters. We refer to these as stale gradients, and its staleness as the number of updates that have occurred between its corresponding read and update operations. Understanding the theoretical

Flashcard 4395260579084

Question

What is the setup of a synchronous version of distributed stochastic gradient descent?

Answer

Here to reconsider a synchronous version of distributed stochastic gradient descent (Sync-SGD), or more generally, Synchronous Stochastic Optimization (Sync-Opt), where the parameter servers wait for all workers to send their gradients, aggregate them, and send the updated parameters to all workers afterward. This ensures that the actual algorithm is a true mini-batch stochastic gradient descent, with an effective batch size equal to the sum of all the mini-batch sizes of the workers.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

Revisiting Distributed Synchronous SGD – arXiv Vanity
14 use versions of Async-SGD where the main potential problem is that each worker computes gradients over a potentially old version of the model. In order to remove this discrepancy, we propose <span>here to reconsider a synchronous version of distributed stochastic gradient descent (Sync-SGD), or more generally, Synchronous Stochastic Optimization (Sync-Opt), where the parameter servers wait for all workers to send their gradients, aggregate them, and send the updated parameters to all workers afterward. This ensures that the actual algorithm is a true mini-batch stochastic gradient descent, with an effective batch size equal to the sum of all the mini-batch sizes of the workers. While this approach solves the staleness problem, it also introduces the potential problem that the actual update time now depends on the slowest worker. Although workers have equivalen

Flashcard 4395262938380

Tags

#machine-learning

Question

What is the straggler problem of the distributed training algorithm Sync-SGD?

Answer

While this synchronous approach solves the staleness problem, it also introduces the potential problem that the actual update time now depends on the slowest worker.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

Revisiting Distributed Synchronous SGD – arXiv Vanity
ers afterward. This ensures that the actual algorithm is a true mini-batch stochastic gradient descent, with an effective batch size equal to the sum of all the mini-batch sizes of the workers. <span>While this approach solves the staleness problem, it also introduces the potential problem that the actual update time now depends on the slowest worker. Although workers have equivalent computation and network communication workload, slow stragglers may result from failing hardware, or contention on shared underlying hardware resources

Flashcard 4395265297676

Tags

#machine-learning

Question

In Sync-SGD from the paper 'Revisiting Synchronous SGD', how is the straggler problem overcome?

Answer

To alleviate the straggler problem, we introduce backup workers ( tail-at-scale ) as follows: instead of having only N workers, we add b extra workers, but as soon as the parameter servers receive gradients from any N workers, they stop waiting and update their parameters using the N gradients. The slowest b workers’ gradients will be dropped when they arrive

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

Revisiting Distributed Synchronous SGD – arXiv Vanity
network communication workload, slow stragglers may result from failing hardware, or contention on shared underlying hardware resources in data centers, or even due to preemption by other jobs. <span>To alleviate the straggler problem, we introduce backup workers (tail-at-scale) as follows: instead of having only N workers, we add b extra workers, but as soon as the parameter servers receive gradients from any N workers, they stop waiting and update their parameters using the N gradients. The slowest b workers’ gradients will be dropped when they arrive. Our method is presented in Algorithms 3, 4. Input : Dataset X Input : B mini-batch size 1 for t=0,1,… do 2 Wait to read θ(t)=(θ(t)[0],…,θ(t)[M]) from parameter servers. G(t)k:=0 for i=

Flashcard 4395269491980

Tags

#reinforcement-learning

Question

How does the IMPALA paper propose to learn off-policy?

Answer

They introduce a novel off-policy actor-critic algorithm for the learner, called V-trace.

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs

Flashcard 4395271851276

Tags

#reinforcement-learning

Question

How does the V-trace target used in the IMPALA distributed training architecture optimise for a fixed point (in value function space) that can be interpolated between the behaviour and target value functions?

Answer

Two types of truncated importance-sampling weights (IS weights that are thresholded) that appear in the definition of the update allow the target to be tuned between very small IS (so just use 1-step for learning the behaviour policy slowly) to a large IS (to learn from several steps, correctly weighted, for the target policy).

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

pdf

cannot see any pdfs