Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



#computer-science #machine-learning #reinforcement-learning
In o↵-policy learning, we reweight the state transitions using importance sampling so that they become appropriate for learni n g about the target policy, but the state distribution is still that of the behavior policy. There is a mismatch. A natural idea is to somehow reweight the states, emphasizing some an d de-emphasizing others, so as to return the distribution of updates to th e on-policy distribution. There would then be a match, and stability and convergence would follow from ex i st i ng result s.
If you want to change selection, open document below and click on "Move attachment"

pdf

cannot see any pdfs


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.