#computer-science #machine-learning #reinforcement-learning
Figure 11.4: Causal relationships amon g the data distrib u t i o n , MDPs, and various objectives. Left, Monte Carlo objectives: Two di↵erent MDPs can produce the same d a t a distribution yet also produce d i↵erent VE s, proving that t h e VE objective cannot be determined from data and is not learnable. However, all su ch VE s must have the same optimal parameter vector, w ⇤ ! Moreover, this same w ⇤ can be determined from another object ive, the RE ,whichis uniquely determined from the data distribution. Thus w ⇤ and the RE are learnable even though the VE s are not. Right, Bootstrapping objectives: Two di↵erent MDPs can produce the same data distribution yet al so produce di↵ere nt BE s and have di↵erent minimizing parameter vectors; these are not learn a b le from the data distribution. The PBE and TDE objectives and their (di↵erent) minima can be directly determined from data and thus are learnable.
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
| status | not read | | reprioritisations | |
|---|
| last reprioritisation on | | | suggested re-reading day | |
|---|
| started reading on | | | finished reading on | |
|---|
Details