BuboFlash - helps with learning

Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#reinforcement-learning

SFs allow one to immediately compute the value of a policy π on any task w : it is easy to show that, when (1) holds, Q π w (s, a) = ψ π (s, a) > w . It is also easy to see that SFs satisfy a Bellman equation in which φ play the role of rewards, so ψ can be learned using any RL method (Szepesv ´ ari, 2010).

If you want to change selection, open document below and click on "Move attachment"

pdf

owner: reseal - (no access) - Universal Successor Features Approximators, p3

Summary

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Details

Discussion

Do you want to join discussion? Click here to log in or create user.