A survey of some simulation-based algorithms for Markov decision processes

Abstract

Many problems modeled by Markov decision processes (MDPs) have very large state and/or action spaces, leading to the well-known curse of dimensionality that makes solution of the resulting models intractable. In other cases, the system of interest is complex enough that it is not feasible to explicitly specify some of the MDP model parameters, but simulated sample paths can be readily generated (e.g., for random state transitions and rewards), albeit at a non-trivial computational cost. For these settings, we have developed various sampling and population-based numerical algorithms to overcome the computational difficulties of computing an optimal solution in terms of a policy and/or value function. Specific approaches presented in this survey include multi-stage adaptive sampling, evolutionary policy iteration and evolutionary random policy search.

References

Page 1

	Year	Citations
Reinforcement Learning: An Introduction Richard S. Sutton, Andy Barto IEEE Transactions on Neural Networks Artificial IntelligenceEngineeringDeep Reinforcement LearningComputer ScienceRobot Learning	1998	26.8K
Reinforcement Learning: An Introduction IEEE Transactions on Neural Networks Artificial IntelligenceEngineeringDeep Reinforcement LearningStochastic GameGame Theory	2005	25.7K
Reinforcement Learning: A Survey Leslie Pack Kaelbling, Michael L. Littman, Andrew Moore Journal of Artificial Intelligence Research Artificial IntelligenceCognitive ScienceEngineeringMachine LearningReinforcement Learning (Computer Engineering)	1996	8.7K
Markov Decision Processes: Discrete Stochastic Dynamic Programming. Kasra Hazeghi, Martin L. Puterman Journal of the American Statistical Association Markov Decision ProcessEngineeringStochastic GameUncertainty QuantificationStochastic Processes	1995	8.4K
R-trees Antonin Guttman Spatial DatabasesInformation RetrievalData ScienceTraditional Indexing MethodsEngineering	1984	6.6K
Finite-time Analysis of the Multiarmed Bandit Problem Peter Auer, Nicolò Cesa‐Bianchi, Paul Fischer Machine Learning	2002	5.7K
Asymptotically efficient adaptive allocation rules Tze-Leung Lai, Herbert Robbins Advances in Applied Mathematics Mathematical ProgrammingEngineeringDynamic Resource AllocationComputational ComplexityComputer Science	1985	2.4K
Asynchronous Stochastic Approximation and Q-Learning John N. Tsitsiklis Machine Learning EngineeringAsynchronous Stochastic ApproximationStochastic OptimizationAlgorithmic LearningComputer Science	1994	611
Sample mean based index policies by<i>O</i>(log<i>n</i>) regret for the multi-armed bandit problem Rajeev Agrawal Advances in Applied Probability	1995	555
Asynchronous stochastic approximation and Q-learning John N. Tsitsiklis Machine Learning EngineeringAsynchronous Stochastic ApproximationStochastic OptimizationAlgorithmic LearningComputer Science	1994	457

Page 1