#reinforcement-learning
Humans and animals seem to exhibit a different type of weighting of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g., $2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.
If you want to change selection, open document below and click on "Move attachment"
pdf
owner:
reseal - (no access) - General non-linear Bellman equations, p2
Summary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details