#computer-science #machine-learning #reinforcement-learning
Figure 11.4: Causal relationships amon g the data distrib u t i o n , MDPs, and various objectives. Left, Monte Carlo objectives: Two di↵erent MDPs can produce the same d a t a distribution yet also produce d i↵erent VE s, proving that t h e VE objective cannot be determined from data and is not learnable. However, all su ch VE s must have the same optimal parameter vector, w ⇤ ! Moreover, this same w ⇤ can be determined from another object ive, the RE ,whichis uniquely determined from the data distribution. Thus w ⇤ and the RE are learnable even though the VE s are not. Right, Bootstrapping objectives: Two di↵erent MDPs can produce the same data distribution yet al so produce di↵ere nt BE s and have di↵erent minimizing parameter vectors; these are not learn a b le from the data distribution. The PBE and TDE objectives and their (di↵erent) minima can be directly determined from data and thus are learnable.
If you want to change selection, open document below and click on "Move attachment"
pdf
cannot see any pdfsSummary
status | not read | | reprioritisations | |
---|
last reprioritisation on | | | suggested re-reading day | |
---|
started reading on | | | finished reading on | |
---|
Details