Edited, memorised or added to reading queue

on 09-Sep-2019 (Mon)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 4361766964492

Tags
#computer-science #machine-learning #reinforcement-learning
Question
Among the algorithms investigated so far in this book, only the [...] methods are true SGD methods. These methods converge robustly under both on-policy and off-policy training as well as for general nonlinear (differentiable) function approximators, though they are often slower than semi-gradient methods with bootstrapping, which are not SGD methods.
Answer
Monte Carlo

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
data-bubo-id="temp-selection">Monte Carlo<span> methods are true SGD methods. These methods converge robustly under both on-policy and o↵-policy training as well as for general nonlinear (di↵erentiable) function approximators, though

Original toplevel document (pdf)

cannot see any pdfs







Der Blut - oder Plasmaspiegel eines Pharmakons nimmt eine zentrale Rolle für pharmakokinetische Analysen ein. Anhand des zeitlichen Verlaufes der Konzentration kön- nen pharmakokinetische Berechnungen angestellt und entsprechende Modelle erarbeitet werden
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




nearen Kinetik oder Kinetik 1. Ordnung . Diese gilt für nahezu alle in der Anästhesie verwendeten Medikamente in klinisch üblicher Dosierun
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 4378963348748

Tags
#reinforcement-learning
Question
The main innovations of the FeUdal Networks paper were: [...] policy gradient for training the Manager; relative rather than absolute goals; lower temporal resolution for Manager; intrinsic motivation for the Worker.
Answer
transition

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4378966494476

Tags
#reinforcement-learning
Question
What type of architecture do FeUdal Networks use so that the Manager operates at a lower temporal resolution?
Answer
(Dilated LSTM) : We propose a novel RNN architecture for the Manager, which operates at lower temporal resolution than the data stream. We define a dilated LSTM analogously to dilated convolutional networks (Yu & Koltun, 2016)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4378968853772

Tags
#reinforcement-learning
Question
Visualise OR draw the network diagram for FeUdal networks. [Maximum score of 3 for correct visualisation alone, must draw to score 4+]
Answer

Figure 1. The schematic illustration of FuN (section 3)


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







Flashcard 4378974096652

Tags
#reinforcement-learning
Question
What is the update rule (policy gradient) used for the Manager in FeUdal Networks?
Answer

\(\nabla g_{t}=A_{t}^{M} \nabla_{\theta} d_{\cos }\left(s_{t+c}-s_{t}, g_{t}(\theta)\right)\)

where \(A_{t}^{M}=R_{t}-V_{t}^{M}\left(x_{t}, \theta\right)\) is the Manager’s advantage function, computed using a value function estimate \(V_{t}^{M}\left(x_{t}, \theta\right)\) from the internal critic; \(d_{\cos }\left(\alpha, \beta\right) = \alpha^T\beta /\left(|\alpha||\beta|\right)\) is the cosine similarity between two vectors.


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

pdf

cannot see any pdfs







CARD 1 Hiragana ≠ お め あ 1. あれ 2. あそこ 3. あした 4. あります 5. あいさつ 6. ありがとう
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs