BuboFlash - helps with learning

Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#reinforcement-learning

Humans and animals seem to exhibit a different type of weighting of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g., $2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.

If you want to change selection, open document below and click on "Move attachment"

pdf

owner: reseal - (no access) - General non-linear Bellman equations, p2

Summary

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

Details

Discussion

Do you want to join discussion? Click here to log in or create user.