Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.


Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

  1. 1: select initial state s ∈ S

  2. 2: forn s steps do

  3. 3: sample w uniformly at random from M

  4. 4: {sample policies, possibly based on current task}

  5. 5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)

  6. 6: if Bernoulli(ε)=1 then a ← Uniform(A)

  7. 7: else a ← [GPI]

  8. 8: Execute action a and observe φ and s′

  9. 9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

  1. 11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ

  2. 12: s←s′

13: returnθ


\(\operatorname{argmax}_{b} \max _{i} \tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{w}\)

\(\operatorname{argmax}_{b}\tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{z_i}\)


Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

  1. 1: select initial state s ∈ S

  2. 2: forn s steps do

  3. 3: sample w uniformly at random from M

  4. 4: {sample policies, possibly based on current task}

  5. 5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)

  6. 6: if Bernoulli(ε)=1 then a ← Uniform(A)

  7. 7: else a ← [GPI]

  8. 8: Execute action a and observe φ and s′

  9. 9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

  1. 11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ

  2. 12: s←s′

13: returnθ



Algorithm 1 Learn USFA with ε-greedy Q-learning

Require: ε, training tasks M, distribution Dz over Rd, number of policies nz

  1. 1: select initial state s ∈ S

  2. 2: forn s steps do

  3. 3: sample w uniformly at random from M

  4. 4: {sample policies, possibly based on current task}

  5. 5: for i ← 1,2,...,nz do zi ∼ Dz(·|w)

  6. 6: if Bernoulli(ε)=1 then a ← Uniform(A)

  7. 7: else a ← [GPI]

  8. 8: Execute action a and observe φ and s′

  9. 9: for i ← 1,2,...,nz do {Update ψ ̃}

10: a′ ← [\(a' \equiv \pi_i(s')\)]

  1. 11: θ←− φ+γψ(s′,a′,zi)−ψ(s,a,zi) ∇_θψ

  2. 12: s←s′

13: returnθ


\(\operatorname{argmax}_{b} \max _{i} \tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{w}\)

\(\operatorname{argmax}_{b}\tilde{\boldsymbol{\psi}}\left(s, b, \mathbf{z}_{i}\right)^{\top} \mathbf{z_i}\)

If you want to change selection, open document below and click on "Move attachment"


owner: reseal - (no access) - Universal Successor Features Approximators, p6


statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill


No repetitions


Do you want to join discussion? Click here to log in or create user.