BuboFlash - helps with learning

Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#computer-science #machine-learning #reinforcement-learning

In the general function-approximation case, the one-step TD error with discounting is t = R t+1 + ˆv(S t+1 ,w t ) ˆv(S t ,w t ). A possible objective function then is what one might call the Mean Squared TD Error: TDE(w)= X s2S µ(s)E ⇥ 2 t S t =s, A t ⇠⇡ ⇤

If you want to change selection, open document below and click on "Move attachment"

pdf

cannot see any pdfs

Summary

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Details

Discussion

Do you want to join discussion? Click here to log in or create user.