Concepedia

Publication | Closed Access

Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

451

Citations

34

References

2013

Year

TLDR

The paper develops an online policy iteration algorithm to learn optimal continuous‑time control for unknown constrained‑input systems, with a learning rule that guarantees rapid convergence of identifier weights. The algorithm uses an actor‑critic neural network architecture with an online identifier NN, and employs experience replay to satisfy persistence of excitation while simultaneously tuning all networks. The study demonstrates that identifier weight errors influence critic convergence, guarantees stability of the actor‑critic‑identifier system, shows convergence to a near‑optimal control law, and validates the approach with a simulation example.

Abstract

This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

References

YearCitations

Page 1