#reinforcement-learning
As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition \(s \stackrel{a}{\rightarrow} s^{\prime}\) is given by \(\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}\), where \(\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}\) are features of (s, a, s') and \(\mathbf{w} \in \mathbb{R}^{d}\) are weights.
If you want to change selection, open document below and click on "Move attachment"
pdf
owner:
reseal - (no access) - Universal Successor Features Approximators, p2
Summary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details