Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



#computer-science #machine-learning #reinforcement-learning
A slightly better algorithm can be derived by doing a few more analytic steps before substituting in v t . Continuing from (11.29): w t+1 = w t + ↵E ⇥ ⇢ t (x t x t+1 )x > t ⇤ E ⇥ x t x > t ⇤ 1 E[⇢ t t x t ] = w t + ↵ E ⇥ ⇢ t x t x > t ⇤ E ⇥ ⇢ t x t+1 x > t ⇤ E ⇥ x t x > t ⇤ 1 E[⇢ t t x t ] = w t + ↵ E ⇥ x t x > t ⇤ E ⇥ ⇢ t x t+1 x > t ⇤ E ⇥ x t x > t ⇤ 1 E[⇢ t t x t ] = w t + ↵ ⇣ E[x t ⇢ t t ] E ⇥ ⇢ t x t+1 x > t ⇤ E ⇥ x t x > t ⇤ 1 E[⇢ t t x t ] ⌘ ⇡ w t + ↵ E[x t ⇢ t t ] E ⇥ ⇢ t x t+1 x > t ⇤ v t (based on (11.28)) ⇡ w t + ↵⇢ t t x t x t+1 x > t v t , (sampling) which again is O ( d ) if the final product ( x > t v t ) is done first. This algorithm is known as either TD(0) with gradient correction (TDC) or, alternatively, as GTD(0)
If you want to change selection, open document below and click on "Move attachment"

pdf

cannot see any pdfs


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.