Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol- icy gradient has a particularly appealing form: it is the expected gradient of the action-value func- tion. This simple form means that the deter- ministic policy gradient can be estimated much more efficiently than the usual stochastic pol- icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter- parts in high-dimensional action spaces.
If you want to change selection, open document below and click on "Move attachment"


owner: misha - (no access) - 2014-silver.pdf, p1


statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on



Do you want to join discussion? Click here to log in or create user.