#computer-science #machine-learning #reinforcement-learning
In the general function-approximation case, the one-step TD error with discounting is t = R t+1 + ˆv(S t+1 ,w t ) ˆv(S t ,w t ). A possible objective function then is what one might call the Mean Squared TD Error: TDE(w)= X s2S µ(s)E ⇥ 2 t S t =s, A t ⇠⇡ ⇤
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details