Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition \(s \stackrel{a}{\rightarrow} s^{\prime}\) is given by \(\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}\), where \(\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}\) are features of (s, a, s') and \(\mathbf{w} \in \mathbb{R}^{d}\) are weights.
If you want to change selection, open document below and click on "Move attachment"


owner: reseal - (no access) - Universal Successor Features Approximators, p2


statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on



Do you want to join discussion? Click here to log in or create user.