BuboFlash - helps with learning

Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#computer-science #machine-learning #reinforcement-learning

One error t h at is always observable is that between the value estimate at each time and the return from that time. The Mean Square Return Error, denoted RE , is the expectation, unde r µ , of the square of this error. In the on-policy case the RE can be written RE(w)=E h G t ˆv(S t ,w) 2 i = VE(w)+E h G t v ⇡ (S t ) 2 i . (11.24)

If you want to change selection, open document below and click on "Move attachment"

pdf

cannot see any pdfs

Summary

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Details

Discussion

Do you want to join discussion? Click here to log in or create user.