#computer-science #machine-learning #reinforcement-learning
One error t h at is always observable is that between the value estimate at each time and the return from that time. The Mean Square Return Error, denoted RE , is the expectation, unde r µ , of the square of this error. In the on-policy case the RE can be written RE(w)=E h G t ˆv(S t ,w) 2 i = VE(w)+E h G t v ⇡ (S t ) 2 i . (11.24)
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details