#reinforcement-learning
As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition $$s \stackrel{a}{\rightarrow} s^{\prime}$$ is given by $$\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}$$, where $$\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}$$ are features of (s, a, s') and $$\mathbf{w} \in \mathbb{R}^{d}$$ are weights.
If you want to change selection, open document below and click on "Move attachment"

#### pdf

owner: reseal - (no access) - Universal Successor Features Approximators, p2