Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



#reinforcement-learning
As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition \(s \stackrel{a}{\rightarrow} s^{\prime}\) is given by \(\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}\), where \(\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}\) are features of (s, a, s') and \(\mathbf{w} \in \mathbb{R}^{d}\) are weights.
If you want to change selection, open document below and click on "Move attachment"

pdf

owner: reseal - (no access) - Universal Successor Features Approximators, p2


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.