Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

SFs allow one to immediately compute the value of a policy π on any task w : it is easy to show that, when (1) holds, Q π w (s, a) = ψ π (s, a) > w . It is also easy to see that SFs satisfy a Bellman equation in which φ play the role of rewards, so ψ can be learned using any RL method (Szepesv ´ ari, 2010).
If you want to change selection, open document below and click on "Move attachment"


owner: reseal - (no access) - Universal Successor Features Approximators, p3


statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on



Do you want to join discussion? Click here to log in or create user.