Table of Contents
Fetching ...

Purely vision-based collective movement of robots

David Mezey, Renaud Bastien, Yating Zheng, Neal McKee, David Stoll, Heiko Hamann, Pawel Romanczuk

TL;DR

This work presents a decentralized, purely vision-based terrestrial swarm, where robots achieve polarized motion with highly effective collision avoidance exclusively through simple visual interactions.

Abstract

Collective movement inspired by animal groups promises inherited benefits for robot swarms, such as enhanced sensing and efficiency. However, while animals move in groups using only their local senses, robots often obey central control or use direct communication, introducing systemic weaknesses to the swarm. In the hope of addressing such vulnerabilities, developing bio-inspired decentralized swarms has been a major focus in recent decades. Yet, creating robots that move efficiently together using only local sensory information remains an extraordinary challenge. In this work, we present a decentralized, purely vision-based swarm of terrestrial robots. Within this novel framework robots achieve collisionless, polarized motion exclusively through minimal visual interactions, computing everything on board based on their individual camera streams, making central processing or direct communication obsolete. With agent-based simulations, we further show that using this model, even with a strictly limited field of view and within confined spaces, ordered group motion can emerge, while also highlighting key limitations. Our results offer a multitude of practical applications from hybrid societies coordinating collective movement without any common communication protocol, to advanced, decentralized vision-based robot swarms capable of diverse tasks in ever-changing environments.

Purely vision-based collective movement of robots

TL;DR

This work presents a decentralized, purely vision-based terrestrial swarm, where robots achieve polarized motion with highly effective collision avoidance exclusively through simple visual interactions.

Abstract

Collective movement inspired by animal groups promises inherited benefits for robot swarms, such as enhanced sensing and efficiency. However, while animals move in groups using only their local senses, robots often obey central control or use direct communication, introducing systemic weaknesses to the swarm. In the hope of addressing such vulnerabilities, developing bio-inspired decentralized swarms has been a major focus in recent decades. Yet, creating robots that move efficiently together using only local sensory information remains an extraordinary challenge. In this work, we present a decentralized, purely vision-based swarm of terrestrial robots. Within this novel framework robots achieve collisionless, polarized motion exclusively through minimal visual interactions, computing everything on board based on their individual camera streams, making central processing or direct communication obsolete. With agent-based simulations, we further show that using this model, even with a strictly limited field of view and within confined spaces, ordered group motion can emerge, while also highlighting key limitations. Our results offer a multitude of practical applications from hybrid societies coordinating collective movement without any common communication protocol, to advanced, decentralized vision-based robot swarms capable of diverse tasks in ever-changing environments.

Paper Structure

This paper contains 28 sections, 24 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Vision-based Model, A: A focal (red) agent with orientation $\psi$ and with 5 numbered visible neighbors (grey). The focal agent has a full, $2\pi$ FOV (shaded with pink) and an unlimited visual range. The projection of the visual field of the focal agent (represented on a dotted circle) is 1 where other agents are visible (thicker dark arcs), 0 otherwise. B: The resulting unfolded 1-dimensional visual projection field $V(\phi, t)$ of the focal agent. Projection blobs are numbered according to the corresponding visible agents in panel A. When agents partially occlude each other, visual blobs merge (e.g.: agent 4 and 5). C: The front-back (left) and left-right (right) social force fields shaping agent velocity and turning rate respectively. The map depicting $F_{v_i}^{soc}$ displays grey (or purple) at specific relative positions, indicating how a focal agent at the origin would accelerate (or decelerate) in response to the presence of another agent of identical body size at those positions. The map showing $F_{v_i}^{soc}$ is grey (or red) at those relative positions where these responses would be turning right (or left). Accurate scales in both maps depend on model parameters with $\alpha_0$ and $\beta_0$ controlling response amplitudes while $\alpha_1$ and $\beta_1$ controlling equilibrium distances from the origin in which no response is given. D: Emergent movement patterns of the vision-based model in toroidal space with full field of view (as in Fig. \ref{['fig:fig_FOV_metrics']}). These are Stuck-in-place (SIP), Flocking (F), Swarming (S), Unordered (X), Milling (M), Moving swarms (pS). Colors along the agents' trajectories represent their orientations $\psi$ (See color wheel on the right).
  • Figure 2: Limiting the active FOV, A: A focal (red) agent with orientation $\psi$, with a limited FOV between relative angles $[-\phi_{L}, \phi_L]$ highlighted with pink between pink dotted lines and with three other agents (grey), either fully visible (2), partially visible (3) or not visible (1). In-sight parts of other agents are colored darker grey. B: The original 1-dimensional visual projection field $V(\phi, t)$ with black visual blobs for in-sight, and light grey for out-of-sight visual information. The active field of view is highlighted with pink between pink dotted lines. C: The resulting 1-dimensional limited projection field $V_i^{\phi_L}(\phi, t)$. Partial visual blobs (3) are fully recovered to avoid boundary effects coming from a limited FOV (See Sec. \ref{['sup_limfov']}). D: Emergent movement patterns (As in Fig. \ref{['fig:fig_FOV_metrics']}) of $N_A=10$ agents on a torus arena and with limited FOV. These are Lines (L), fragmented Leader-Follower (frLeFo), fragmented Flocking (frF), Unordered (X), Leader-Follower (LeFo), Flocking (F). Colors along the agents' trajectories represent their orientations $\psi$ (See color wheel on the right).
  • Figure 3: Effects of a limited FOV on a torus. Emergent collective movement patterns as depicted in Fig. \ref{['fig:fig_FOV_patterns']} (top row) and summary metrics (rows) of the resulting collective movement with different fields of view (columns) over different $\alpha_0$ (y axis) and $\beta_0$ (x axis) parameters. Simulations with full FOV align with the results of bastien2020model. Observed movement patterns are Unordered (X), Leader-Follower (LeFo), fragmented Leader-Follower (frLeFo), Flocking (F), fragmented Flocking (frF), Lines (L). Reducing agents' FOV facilitates polarized movement via leader-follower (LeFo, frLeFo) dynamics. Due to a blind spot behind agents, groups with lower FOVs are less cohesive than those with a full field of view. Collisionless polarized flocking behavior was observed for all FOV values larger than 25%.
  • Figure 4: Effects of a confined arena, Summary metrics (rows) of the vision-based collective movement with different FOVs (columns) over different $\alpha_0$ (y axis) and $\beta_0$ (x axis) parameters in a confined arena with reflective walls. Introducing boundaries generally decreases polarization and cohesion of the groups and makes it challenging to identify stable movement patterns.
  • Figure 5: Robot Platform, A: Close up image of three vision-based robots. B: Computational steps of the robot controller implementing vision-based collective movement. 1.: A camera image is acquired, and unwarped from fisheye lense distortion. 2.: Other robots are detected on the image on board using a tensor processor unit (TPU). 3.: Detection boxes (yellow) are projected to the horizontal axis to create the limited projection field. 4.: Partial projection blobs are recovered using the height of the detection box to create the full visual projection field (VPF). 5.: The VPF is fed into the vision-based model to calculate the desired robot state. 6.: Motor commands are calculated and executed by the robot. C: Hardware components of a vision-based robot: 1: Camera module, 2: Battery, 3: 3D-printed structural scaffolding, 4: Socket for Raspberry Pi 4B and Google Coral EdgeTPU, 5: Thymio II base robot. D: Swarm of ten vision-based robots
  • ...and 3 more figures