Publication | Closed Access
SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
62
Citations
50
References
2018
Year
Artificial IntelligenceEngineeringMachine LearningSequential LearningRobot TasksMotor ControlLearning ControlRobot LearningCognitive ScienceRoboticsAutonomous LearningPure Reinforcement LearningSequential Decision MakingComputer ScienceInverse Reinforcement LearningReward HackingAutomationDelayed RewardsBehavioral CloningObject Manipulation
SWIRL is a hybrid policy‑search algorithm that blends exploration and demonstration to learn robot tasks with delayed rewards. It uses unsupervised learning on a few expert demonstrations to structure exploration, models long‑horizon tasks as sequences of local reward functions and subtask transitions, and applies Q‑learning over this approximation to compute a policy. Experiments demonstrate that SWIRL requires far fewer rollouts than pure reinforcement learning and fewer demonstrations than behavioral cloning, achieving 85 % fewer rollouts on a parallel‑parking task and a 36 % reward improvement on deformable‑sheet tensioning.
We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1