Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#computer-science #machine-learning #reinforcement-learning
In the general function-approximation case, the one-step TD error with discounting is t = R t+1 + ˆv(S t+1 ,w t ) ˆv(S t ,w t ). A possible objective function then is what one might call the Mean Squared TD Error: TDE(w)= X s2S µ(s)E ⇥ 2 t S t =s, A t ⇠⇡ ⇤
If you want to change selection, open document below and click on "Move attachment"


cannot see any pdfs


statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on



Do you want to join discussion? Click here to log in or create user.