How does a universal value function approximator (UVFA) generalise across the space of tasks in an environment? What assumptions could one make when implementing one?
Answer
Generalisation is achieved by modelling the shape of the universal optimal value function Q*(s, a, w). By using a neural network, for example, to represent Q~(s, a, w) one is implicitly assuming that Q∗(s, a, w) is smooth in the space of tasks; roughly speaking, this means that small perturbations to w will result in small changes in Q∗(s, a, w).
Tags
#reinforcement-learning
Question
How does a universal value function approximator (UVFA) generalise across the space of tasks in an environment? What assumptions could one make when implementing one?
Answer
?
Tags
#reinforcement-learning
Question
How does a universal value function approximator (UVFA) generalise across the space of tasks in an environment? What assumptions could one make when implementing one?
Answer
Generalisation is achieved by modelling the shape of the universal optimal value function Q*(s, a, w). By using a neural network, for example, to represent Q~(s, a, w) one is implicitly assuming that Q∗(s, a, w) is smooth in the space of tasks; roughly speaking, this means that small perturbations to w will result in small changes in Q∗(s, a, w).
If you want to change selection, open document below and click on "Move attachment"
pdf
owner: reseal - (no access) - Universal Successor Features Approximators, p3
Summary
status
not learned
measured difficulty
37% [default]
last interval [days]
repetition number in this series
0
memorised on
scheduled repetition
scheduled repetition interval
last repetition or drill
Details
No repetitions
Discussion
Do you want to join discussion? Click here to log in or create user.