The optimization of path planning for multi-robot system using Boltzmann Policy based Q-learning algorithm

Abstract

Path planning is a fundamental method in solving mazes or moving robots traversing through open fields with obstacles. Q-learning method is a model-independent reinforcement learning method, which can be utilized in path planning optimization for robots in a multi-robot collaboration system. However, the efficiency of the traditional Q-learning algorithm is relatively low because of the adopted random exploration policy. In this paper, a Boltzmann Policy based Q-learning algorithm is proposed and applied into the problem of path planning optimization of a Multi-robot system. The method is composed of two parts, which are Q-learning and Boltzmann policy. Q-learning is a grid-based algorithm that can solve the low-dimensional path planning problems. Boltzmann Policy adopts statistical probability and simulated annealing, so it can help to avoid trapping in local optimum and provide global optimal solution. Player/Stage is used to evaluate the performance, which shows that the proposed Q-learning algorithm based on the Boltzmann policy can remarkably improve the efficiency of the multi-robot System, reducing the number of explorations and converging the process.

References

Page 1

	Year	Citations

Page 1