#computer-science #machine-learning #reinforcement-learning
The Bellman error for a stat e is the expected TD error in that state. So let’s repeat the d er i vation above with the expected TD error (all expectations here are implicitly conditional on S t ): w t+1 = w t 1 2 ↵r(E ⇡ [ t ] 2 ) = w t 1 2 ↵r(E b [⇢ t t ] 2 ) = w t ↵E b [⇢ t t ] rE b [⇢ t t ] = w t ↵E b ⇥ ⇢ t (R t+1 + ˆv(S t+1 ,w) ˆv(S t ,w)) ⇤ E b [⇢ t r t ] = w t + ↵ h E b ⇥ ⇢ t (R t+1 + ˆv(S t+1 ,w)) ⇤ ˆv(S t ,w) ih rˆv(S t ,w) E b ⇥ ⇢ t rˆv(S t+1 ,w) ⇤ i . This update and various ways of samp li n g it are referred to as the residual-gradient algorithm.
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details