A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment

TLDR

Unmanned aerial vehicles have been deployed for diverse military and civilian tasks, and scaling to multiple units requires autonomous coordination akin to natural swarms. The study investigates whether small fixed‑wing UAVs can achieve flocking behavior through a model‑free reinforcement‑learning framework. Peng’s Q(λ) with a variable learning rate is applied in a leader‑follower Markov decision process, and the resulting policies are benchmarked against stochastic optimal‑control solutions by evaluating average flight cost. Simulations confirm that the learning approach enables agents to flock reliably in a leader‑follower topology despite non‑stationary stochastic disturbances.

Abstract

In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(λ) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.

References

Page 1

	Year	Citations

Page 1