What are the successor features of a state-action pair (s, a) under policy π?
Answer
The SFs \(\boldsymbol{\psi} \in \mathbb{R}^{d}\) of a state-action pair (s, a) under policy \(\pi\) are given by \(\psi^{\pi}(s, a) \equiv \mathrm{E}^{\pi}\left[\sum_{i=t}^{\infty} \gamma^{i-t} \boldsymbol{\phi}_{i+1} | S_{t}=s, A_{t}=a\right]\), where the \(\phi_{i+1} \in \mathbb{R}^{d}\) are features of \((S_i, A_i, S_{i+1})\)
Tags
#reinforcement-learning
Question
What are the successor features of a state-action pair (s, a) under policy π?
Answer
?
Tags
#reinforcement-learning
Question
What are the successor features of a state-action pair (s, a) under policy π?
Answer
The SFs \(\boldsymbol{\psi} \in \mathbb{R}^{d}\) of a state-action pair (s, a) under policy \(\pi\) are given by \(\psi^{\pi}(s, a) \equiv \mathrm{E}^{\pi}\left[\sum_{i=t}^{\infty} \gamma^{i-t} \boldsymbol{\phi}_{i+1} | S_{t}=s, A_{t}=a\right]\), where the \(\phi_{i+1} \in \mathbb{R}^{d}\) are features of \((S_i, A_i, S_{i+1})\)
If you want to change selection, open document below and click on "Move attachment"
pdf
owner: reseal - (no access) - Universal Successor Features Approximators, p3
Summary
status
not learned
measured difficulty
37% [default]
last interval [days]
repetition number in this series
0
memorised on
scheduled repetition
scheduled repetition interval
last repetition or drill
Details
No repetitions
Discussion
Do you want to join discussion? Click here to log in or create user.