Visual collective behaviors on spherical robots
Diego Castro, Christophe Eloy, Franck Ruffier
TL;DR
This work addresses vision-based collective motion by implementing a robot-in-the-loop visual flocking model on 10 spherical Sphero robots, using only early-vision cues (angular position, angular size, optic flow) augmented by a visual anchor to confine the flock. The authors formulate per-robot angular velocity components from retinal cues, introduce an avoidance term, and rely on an external panorama reconstructed for each robot, enabling independent, parallel control without inter-robot communication. The key contributions include the visual anchor mechanism, a robust simulation-to-robot validation showing swarming, milling, and bistable phases, and phase-diagram alignment between simulation and hardware, indicating a minimal yet effective visual model. This approach bridges the gap between numerical flocking models and physical experiments, offering a scalable, sensor-grounded framework for visual collective behaviors in robotics with practical implications for swarm robotics and embodied AI.
Abstract
The implementation of collective motion, traditionally, disregard the limited sensing capabilities of an individual, to instead assuming an omniscient perception of the environment. This study implements a visual flocking model in a ``robot-in-the-loop'' approach to reproduce these behaviors with a flock composed of 10 independent spherical robots. The model achieves robotic collective motion by only using panoramic visual information of each robot, such as retinal position, optical size and optic flow of the neighboring robots. We introduce a virtual anchor to confine the collective robotic movements so to avoid wall interactions. For the first time, a simple visual robot-in-the-loop approach succeed in reproducing several collective motion phases, in particular, swarming, and milling. Another milestone achieved with by this model is bridging the gap between simulation and physical experiments by demonstrating nearly identical behaviors in both environments with the same visual model. To conclude, we show that our minimal visual collective motion model is sufficient to recreate most collective behaviors on a robot-in-the-loop system that is scalable, behaves as numerical simulations predict and is easily comparable to traditional models.
