Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
#to-read
In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD( λ \lambda )'s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-free TD methods to achieve this with per-step computation linear in the number of function approximation parameters are the gradient-TD family of methods including TDC, GTD( λ \lambda ), and GQ( λ \lambda ). Compared to these methods, our _emphatic TD( λ \lambda )_ is simpler and easier to use; it has only one learned parameter vector and one step-size parameter. Our treatment includes general state-dependent discounting and bootstrapping functions, and a way of specifying varying degrees of interest in accurately valuing different states.
If you want to change selection, open document below and click on "Move attachment"


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.