#computer-science #machine-learning #reinforcement-learning
Thus, following the standard SGD approach, one can derive the per-step update based on a sample of this expected value: w t+1 = w t 1 2 ↵r(⇢ t 2 t ) = w t ↵⇢ t t r t = w t + ↵⇢ t t rˆv(S t ,w t ) rˆv(S t+1 ,w t ) , (11.23) which you will recognize as the same as the semi-gradient TD algorithm (11.2) except f or the additional final term.
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details