Do We Run Large-scale Multi-Robot Systems on the Edge? More Evidence for Two-Phase Performance in System Size Scaling
Jonas Kuckling, Robin Luckey, Viktor Avrutin, Andrew Vardy, Andreagiovanni Reina, Heiko Hamann
TL;DR
The paper investigates scalability in large-scale multi-robot systems and identifies a two-phase performance pattern that emerges near a critical swarm size $N_c$, posing risks to reliability if systems are optimized for peak performance. It combines three scenarios—warehouse AFLEs, object clustering, and emergent taxis—with two analytical models: a queueing-theoretic approach featuring queue-length dependent service time and a population-dynamics model with three robot states, to explain and predict the observed bimodal behavior and long transients. Key findings include evidence of bistability and hysteresis, as well as critical slowing down near bifurcation points, indicating that small changes in swarm size can cause large jumps in performance. The work highlights practical implications for designing robust scalability in swarm robotics and offers modeling tools to anticipate and mitigate edge-of-scale failures in real deployments.
Abstract
With increasing numbers of mobile robots arriving in real-world applications, more robots coexist in the same space, interact, and possibly collaborate. Methods to provide such systems with system size scalability are known, for example, from swarm robotics. Example strategies are self-organizing behavior, a strict decentralized approach, and limiting the robot-robot communication. Despite applying such strategies, any multi-robot system breaks above a certain critical system size (i.e., number of robots) as too many robots share a resource (e.g., space, communication channel). We provide additional evidence based on simulations, that at these critical system sizes, the system performance separates into two phases: nearly optimal and minimal performance. We speculate that in real-world applications that are configured for optimal system size, the supposedly high-performing system may actually live on borrowed time as it is on a transient to breakdown. We provide two modeling options (based on queueing theory and a population model) that may help to support this reasoning.
