Coverage Path Planning with Proximal Policy Optimization in a Grid-based Environment

Abstract

We present an on-line approach for coverage path planning in 2D grid environments based on reinforcement learning. We used actor-critic architecture with convolutional layers to learn an agent's policy from simulated limited-range sensor observations. We experimented with different reward functions and network architectures to get a minimal repetition rate. Our results show that model generalizes well to unseen environments with complex geometry and dynamic obstacles and demonstrates the ability to learn some optimal trajectory patterns like circular and boustrophedon motion. An approach may as well be suitable for a multi-agent setting with minor adjustments, as shown by our simulations.

References

Page 1

	Year	Citations

Page 1