Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



#computer-science #machine-learning #reinforcement-learning
Figure 11.4: Causal relationships amon g the data distrib u t i o n , MDPs, and various objectives. Left, Monte Carlo objectives: Two di↵erent MDPs can produce the same d a t a distribution yet also produce d i↵erent VE s, proving that t h e VE objective cannot be determined from data and is not learnable. However, all su ch VE s must have the same optimal parameter vector, w ⇤ ! Moreover, this same w ⇤ can be determined from another object ive, the RE ,whichis uniquely determined from the data distribution. Thus w ⇤ and the RE are learnable even though the VE s are not. Right, Bootstrapping objectives: Two di↵erent MDPs can produce the same data distribution yet al so produce di↵ere nt BE s and have di↵erent minimizing parameter vectors; these are not learn a b le from the data distribution. The PBE and TDE objectives and their (di↵erent) minima can be directly determined from data and thus are learnable.
If you want to change selection, open document below and click on "Move attachment"

pdf

cannot see any pdfs


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.