Adaptive Dynamic Programming: An Introduction

TLDR

ADP algorithms are typically classified into two classes: those requiring an initial stable policy and those that do not, with the latter offering lower computational cost at the expense of stability guarantees, and recent studies have focused on convergence analysis of these schemes. This article surveys recent research trends in adaptive/approximate dynamic programming, outlining structural variations, algorithmic developments, and applications, and identifies topics for future investigation. The authors review recent ADP research by describing variations in ADP scheme structures, the development of new algorithms, and their practical applications.

Abstract

In this article, we introduce some recent research trends within the field of adaptive/approximate dynamic programming (ADP), including the variations on the structure of ADP schemes, the development of ADP algorithms and applications of ADP schemes. For ADP algorithms, the point of focus is that iterative algorithms of ADP can be sorted into two classes: one class is the iterative algorithm with initial stable policy; the other is the one without the requirement of initial stable policy. It is generally believed that the latter one has less computation at the cost of missing the guarantee of system stability during iteration process. In addition, many recent papers have provided convergence analysis associated with the algorithms developed. Furthermore, we point out some topics for future studies.

References

Page 1

	Year	Citations
Reinforcement Learning: An Introduction Richard S. Sutton, Andy Barto IEEE Transactions on Neural Networks Artificial IntelligenceEngineeringDeep Reinforcement LearningComputer ScienceRobot Learning	1998	26.8K
Reinforcement Learning: An Introduction IEEE Transactions on Neural Networks Artificial IntelligenceEngineeringDeep Reinforcement LearningStochastic GameGame Theory	2005	25.7K
Learning from delayed rewards Chris Watkins OpenGrey (Institut de l'Information Scientifique et Technique) Artificial IntelligenceEngineeringMachine LearningStochastic GameGame Theory	1989	5.5K
A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC) James S. Albus Journal of Dynamic Systems Measurement and Control EngineeringCm AcComputer ArchitectureCm Ac MemoryMotor Control	1975	2.2K
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach Murad Abu-Khalaf, Frank L. Lewis Automatica Nonlinear ControlNonlinear System IdentificationEngineeringMathematical Control TheoryMechanical Systems	2005	1.3K
On an iterative technique for Riccati equation computations David L. Kleinman IEEE Transactions on Automatic Control Numerical AnalysisNumerical ComputationValidated NumericsNonlinear EquationIterative Technique	1968	1.2K
Adaptive critic designs Danil Prokhorov, Donald C. Wunsch IEEE Transactions on Neural Networks Artificial IntelligenceModel OptimizationCognitive ScienceOptimal ControlEngineering	1997	1.2K
Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof A. Al-Tamimi, Frank L. Lewis, Murad Abu-Khalaf IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) Mathematical ProgrammingNumerical AnalysisNonlinear ControlOptimal ControlEngineering	2008	1.1K
Approximate dynamic programming for real-time control and neural modeling Paul J. Werbos Medical Entomology and Zoology Mathematical ProgrammingReal-time ControlEngineeringApproximate Dynamic ProgrammingSystems Engineering	1992	977
Efficient algorithms for globally optimal trajectories John N. Tsitsiklis IEEE Transactions on Automatic Control Numerical AnalysisMathematical ProgrammingEngineeringTrajectory PlanningShortest Path Algorithm	1995	849

Page 1