BuboFlash - helps with learning

Edited, memorised or added to reading queue

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Flashcard 4361766964492

Tags

#computer-science #machine-learning #reinforcement-learning

Question

Among the algorithms investigated so far in this book, only the [...] methods are true SGD methods. These methods converge robustly under both on-policy and off-policy training as well as for general nonlinear (differentiable) function approximators, though they are often slower than semi-gradient methods with bootstrapping, which are not SGD methods.

Answer

Monte Carlo

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

Parent (intermediate) annotation

Open it
data-bubo-id="temp-selection">Monte Carlo<span> methods are true SGD methods. These methods converge robustly under both on-policy and o↵-policy training as well as for general nonlinear (di↵erentiable) function approximators, though

Original toplevel document (pdf)

cannot see any pdfs

Annotation 4375487843596

Der Blut - oder Plasmaspiegel eines Pharmakons nimmt eine zentrale Rolle für pharmakokinetische Analysen ein. Anhand des zeitlichen Verlaufes der Konzentration kön- nen pharmakokinetische Berechnungen angestellt und entsprechende Modelle erarbeitet werden

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 4375580642572

nearen Kinetik oder Kinetik 1. Ordnung . Diese gilt für nahezu alle in der Anästhesie verwendeten Medikamente in klinisch üblicher Dosierun

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Flashcard 4378963348748

Tags

#reinforcement-learning

Question

The main innovations of the FeUdal Networks paper were: [...] policy gradient for training the Manager; relative rather than absolute goals; lower temporal resolution for Manager; intrinsic motivation for the Worker.

Answer

transition

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Flashcard 4378966494476

Tags

#reinforcement-learning

Question

What type of architecture do FeUdal Networks use so that the Manager operates at a lower temporal resolution?

Answer

(Dilated LSTM) : We propose a novel RNN architecture for the Manager, which operates at lower temporal resolution than the data stream. We define a dilated LSTM analogously to dilated convolutional networks (Yu & Koltun, 2016)

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Flashcard 4378968853772

Tags

#reinforcement-learning

Question

Visualise OR draw the network diagram for FeUdal networks. [Maximum score of 3 for correct visualisation alone, must draw to score 4+]

Answer

Figure 1. The schematic illustration of FuN (section 3)

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Flashcard 4378974096652

Tags

#reinforcement-learning

Question

What is the update rule (policy gradient) used for the Manager in FeUdal Networks?

Answer

\(\nabla g_{t}=A_{t}^{M} \nabla_{\theta} d_{\cos }\left(s_{t+c}-s_{t}, g_{t}(\theta)\right)\)

where \(A_{t}^{M}=R_{t}-V_{t}^{M}\left(x_{t}, \theta\right)\) is the Manager’s advantage function, computed using a value function estimate \(V_{t}^{M}\left(x_{t}, \theta\right)\) from the internal critic; \(d_{\cos }\left(\alpha, \beta\right) = \alpha^T\beta /\left(|\alpha||\beta|\right)\) is the cosine similarity between two vectors.

status	not learned	measured difficulty	37% [default]	last interval [days]
repetition number in this series	0	memorised on		scheduled repetition
scheduled repetition interval		last repetition or drill

pdf

cannot see any pdfs

Annotation 4380797046028

CARD 1 Hiragana ≠ おめあ 1. あれ 2. あそこ 3. あした 4. あります 5. あいさつ 6. ありがとう

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Edited, memorised or added to reading queue

on 09-Sep-2019 (Mon)

Parent (intermediate) annotation

Original toplevel document (pdf)

pdf

pdf

pdf

pdf

pdf

pdf

pdf