#computer-science #machine-learning #reinforcement-learning
The idea of storing some estimates separately and then combini n g them with samples is a good one and is also used in Gradient-TD methods. Gradient-TD methods estimate and store the product of the second two factors in (11.27) . These factors are a d ⇥ d matrix and a d -vector , so their pro du ct is j us t a d -vector , l ike w itself. We denote t h is second learned vector as v: v ⇡ E ⇥ x t x > t ⇤ 1 E[⇢ t t x t ]
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details