Edited, memorised or added to reading queue

on 09-Sep-2019 (Mon)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 4361766964492

#computer-science #machine-learning #reinforcement-learning
Among the algorithms investigated so far in this book, only the [...] methods are true SGD methods. These methods converge robustly under both on-policy and off-policy training as well as for general nonlinear (differentiable) function approximators, though they are often slower than semi-gradient methods with bootstrapping, which are not SGD methods.
Monte Carlo

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
data-bubo-id="temp-selection">Monte Carlo<span> methods are true SGD methods. These methods converge robustly under both on-policy and o↵-policy training as well as for general nonlinear (di↵erentiable) function approximators, though

Original toplevel document (pdf)

cannot see any pdfs

Der Blut - oder Plasmaspiegel eines Pharmakons nimmt eine zentrale Rolle für pharmakokinetische Analysen ein. Anhand des zeitlichen Verlaufes der Konzentration kön- nen pharmakokinetische Berechnungen angestellt und entsprechende Modelle erarbeitet werden
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


cannot see any pdfs

nearen Kinetik oder Kinetik 1. Ordnung . Diese gilt für nahezu alle in der Anästhesie verwendeten Medikamente in klinisch üblicher Dosierun
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


cannot see any pdfs

Flashcard 4378963348748

The main innovations of the FeUdal Networks paper were: [...] policy gradient for training the Manager; relative rather than absolute goals; lower temporal resolution for Manager; intrinsic motivation for the Worker.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill


cannot see any pdfs

Flashcard 4378966494476

What type of architecture do FeUdal Networks use so that the Manager operates at a lower temporal resolution?
(Dilated LSTM) : We propose a novel RNN architecture for the Manager, which operates at lower temporal resolution than the data stream. We define a dilated LSTM analogously to dilated convolutional networks (Yu & Koltun, 2016)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill


cannot see any pdfs

Flashcard 4378968853772

Visualise OR draw the network diagram for FeUdal networks. [Maximum score of 3 for correct visualisation alone, must draw to score 4+]

Figure 1. The schematic illustration of FuN (section 3)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill


cannot see any pdfs

Flashcard 4378974096652

What is the update rule (policy gradient) used for the Manager in FeUdal Networks?

\(\nabla g_{t}=A_{t}^{M} \nabla_{\theta} d_{\cos }\left(s_{t+c}-s_{t}, g_{t}(\theta)\right)\)

where \(A_{t}^{M}=R_{t}-V_{t}^{M}\left(x_{t}, \theta\right)\) is the Manager’s advantage function, computed using a value function estimate \(V_{t}^{M}\left(x_{t}, \theta\right)\) from the internal critic; \(d_{\cos }\left(\alpha, \beta\right) = \alpha^T\beta /\left(|\alpha||\beta|\right)\) is the cosine similarity between two vectors.

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill


cannot see any pdfs

CARD 1 Hiragana ≠ お め あ 1. あれ 2. あそこ 3. あした 4. あります 5. あいさつ 6. ありがとう
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


cannot see any pdfs