Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC
Xinglong Zhang, Wei Pan, Cong Li, Xin Xu, Xiangke Wang, Ronghua Zhang, Dewen Hu
TL;DR
The paper tackles the challenge of scalable, real-time cooperative control for large-scale multirobot systems by replacing computationally intensive online NLP solvers with a distributed policy-learning approach that yields explicit closed-loop DMPC policies. It introduces DLPC, a distributed online actor-critic framework that updates policies forward in time within each prediction interval, while guaranteeing stability through a practical Lyapunov-based condition and safety via a force-field-inspired barrier design. The approach demonstrates strong scalability, achieving online policy learning for up to $10{,}000$ robots with linear growth in computational load, and shows successful sim-to-real transfer in both drone and wheeled robot experiments. These results indicate a significant advancement toward fast, scalable, and safe optimization-based control for large multirobot systems, with promising directions for model-free extensions and time-varying networks.
Abstract
Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.
