Inverse Reinforcement Learning in Tracking Control Based on Inverse Optimal Control

TLDR

The paper proposes a novel inverse reinforcement learning algorithm that learns an unknown performance objective function for tracking control. The algorithm consists of an optimal control update, a gradient‑descent correction, and an inverse optimal control update, and is implemented in both model‑based and two model‑free variants. The study demonstrates that inverse RL and IOC are closely related, that the reward weight generating a target policy is not unique, that the full set of equivalent weights can be characterized, and that simulation experiments confirm the algorithms’ effectiveness.

Abstract

This article provides a novel inverse reinforcement learning (RL) algorithm that learns an unknown performance objective function for tracking control. The algorithm combines three steps: 1) an optimal control update; 2) a gradient descent correction step; and 3) an inverse optimal control (IOC) update. The new algorithm clarifies the relation between inverse RL and IOC. It is shown that the reward weight of an unknown performance objective that generates a target control policy may not be unique. We characterize the set of all weights that generate the same target control policy. We develop a model-based algorithm and, further, two model-free algorithms for systems with unknown model information. Finally, simulation experiments are presented to show the effectiveness of the proposed algorithms.

References

Page 1

	Year	Citations

Page 1