Tags
#reinforcement-learning
Question
Give an example of human preference ordering reversal that contradicts the use of exponential discounting in a reward function.
For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.

Tags
#reinforcement-learning
Question
Give an example of human preference ordering reversal that contradicts the use of exponential discounting in a reward function.
?

Tags
#reinforcement-learning
Question
Give an example of human preference ordering reversal that contradicts the use of exponential discounting in a reward function.
For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks.
If you want to change selection, open original toplevel document below and click on "Move attachment"

#### Parent (intermediate) annotation

Open it
ing of the future than would emerge from the standard linear Bellman equation which leads to exponential discounting when unrolled multiple steps because of the repeated multiplication with γ . <span>One consequence is that the preference ordering of two dif- ferent rewards occurring at different times can reverse, depending on how far in the future the first reward is. For instance, humans may prefer a single sparse reward of + 1 (e.g., $1) now over a reward of + 2 (e.g.,$2) received a week later, but may also prefer a re- ward of + 2 received after 20 weeks over a reward of + 1 after 19 weeks. <span>

#### Original toplevel document (pdf)

owner: reseal - (no access) - General non-linear Bellman equations, p2

#### Summary

status measured difficulty not learned 37% [default] 0

No repetitions