BuboFlash - helps with learning

Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

Presentation
Question

Answer

Tags

#reinforcement-learning

Question

Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

1: select initial state s ∈ S
2: forn s steps do
3: sample w uniformly at random from M
4: {sample policies, possibly based on current task}
5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)
6: if Bernoulli(ε)=1 then a ← Uniform(A)
7: else a ← [GPI]
8: Execute action a and observe φ and s′
9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ
12: s←s′

13: returnθ

Answer

\(\operatorname{argmax}_{b} \max _{i} \tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{w}\)

\(\operatorname{argmax}_{b}\tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{z_i}\)

Tags

#reinforcement-learning

Question

Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

1: select initial state s ∈ S
2: forn s steps do
3: sample w uniformly at random from M
4: {sample policies, possibly based on current task}
5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)
6: if Bernoulli(ε)=1 then a ← Uniform(A)
7: else a ← [GPI]
8: Execute action a and observe φ and s′
9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ
12: s←s′

13: returnθ

Answer

?

Tags

#reinforcement-learning

Question

Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

1: select initial state s ∈ S
2: forn s steps do
3: sample w uniformly at random from M
4: {sample policies, possibly based on current task}
5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)
6: if Bernoulli(ε)=1 then a ← Uniform(A)
7: else a ← [GPI]
8: Execute action a and observe φ and s′
9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ
12: s←s′

13: returnθ

Answer

\(\operatorname{argmax}_{b} \max _{i} \tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{w}\)

\(\operatorname{argmax}_{b}\tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{z_i}\)

If you want to change selection, open document below and click on "Move attachment"

pdf

owner: reseal - (no access) - Universal Successor Features Approximators, p6

Summary

status	not learned		measured difficulty	37% [default]		last interval [days]
repetition number in this series	0		memorised on			scheduled repetition
scheduled repetition interval			last repetition or drill

Details

No repetitions

Discussion

Do you want to join discussion? Click here to log in or create user.