Publication | Closed Access
Reinforcement Learning in Finite MDPs: PAC Analysis
199
Citations
43
References
2009
Year
Mathematical ProgrammingNear-optimal BehaviorMachine LearningEngineeringStochastic OptimizationStochastic GameGame TheoryAlgorithmic LearningComputational ComplexitySequential Decision MakingComputer ScienceProbability TheoryPac-mdp AlgorithmsLower BoundsMarkov Decision ProcessOperations Research
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These PAC-MDP algorithms include the well-known E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined analysis for upper and lower bounds is presented to yield insight into the differences between the model-free Delayed Q-learning and the model-based R-MAX.
| Year | Citations | |
|---|---|---|
Page 1
Page 1