We consider a broader class of Bellman equations that are non-linear in the rewards and future values: \(v(s)=\mathbb{E}\left[f\left(R_{t+1}, v\left(S_{t+1}\right)\right) | S_{t}=s, A_{t} \sim \pi\left(S_{t}\right)\right]\) .
status | not read | reprioritisations | ||
---|---|---|---|---|
last reprioritisation on | suggested re-reading day | |||
started reading on | finished reading on |