#reinforcement-learning

What are the successor features of a state-action pair (s, a) under policy π?

The SFs \(\boldsymbol{\psi} \in \mathbb{R}^{d}\) of a state-action pair (s, a) under policy \(\pi\) are given by \(\psi^{\pi}(s, a) \equiv \mathrm{E}^{\pi}\left[\sum_{i=t}^{\infty} \gamma^{i-t} \boldsymbol{\phi}_{i+1} | S_{t}=s, A_{t}=a\right]\), where the \(\phi_{i+1} \in \mathbb{R}^{d}\) are features of \((S_i, A_i, S_{i+1})\)

