Publication | Closed Access
PAC model-free reinforcement learning
409
Citations
7
References
2006
Year
Unknown Venue
Artificial IntelligenceModel-free LearningEngineeringMachine LearningData ScienceStochastic GameDeep Reinforcement LearningAction Model LearningEfficient Reinforcement LearningFinite StateComputer ScienceSequential Decision MakingRobot LearningMarkov Decision Process
The authors propose Delayed Q‑Learning, a new algorithm for finite‑state, finite‑action Markov Decision Processes. Delayed Q‑Learning learns from a single continuous experience stream without resets or parallel sampling. The algorithm is PAC, achieving near‑optimal performance with O(SA) space and only Õ(SA) suboptimal steps, proving efficient model‑free reinforcement learning is possible and offering lower storage, experience, and per‑step computation than prior PAC methods.
For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1