Table of Contents
Fetching ...

Learning to flock in open space by avoiding collisions and staying together

Martino Brambati, Antonio Celani, Marco Gherardi, Francesco Ginelli

TL;DR

The paper addresses how cohesive flocking can emerge in open space through a multi-agent reinforcement learning framework that uses Voronoi topological neighbours and a local cost based on ${\cal L}(d)=a/d-b/\sqrt{d}$ to balance alignment and attraction. The learned policy exhibits Vicsek-like dynamics with high polar order, featuring a two-regime behavior: strong alignment at short distances and a flexible mix of alignment and attraction at larger separations, yielding starling-like nonequilibrium liquid structure. The work demonstrates robustness to training scheme (CT vs DT) and to cost-function details, and it shows that short-range repulsion is essential for flocking; results have potential implications for understanding animal behavior and guiding swarm robotics in open environments. Overall, the study provides a concrete mechanism by which staying together while avoiding collisions can naturally give rise to cohesive collective motion in active matter systems.

Abstract

We investigate the emergence of cohesive flocking in open, boundless space using a multi-agent reinforcement learning framework. Agents integrate positional and orientational information from their closest topological neighbours and learn to balance alignment and attractive interactions by optimizing a local cost function that penalizes both excessive separation and close-range crowding. The resulting Vicsek-like dynamics is robust to algorithmic implementation details and yields cohesive collective motion with high polar order. The optimal policy is dominated by strong aligning interactions when agents are sufficiently close to their neighbours, and a flexible combination of alignment and attraction at larger separations. We further characterize the internal structure and dynamics of the resulting groups using liquid-state metrics and neighbour exchange rates, finding qualitative agreement with empirical observations in starling flocks. These results suggest that flocking may emerge in groups of moving agents as an adaptive response to the biological imperatives of staying together while avoiding collisions.

Learning to flock in open space by avoiding collisions and staying together

TL;DR

The paper addresses how cohesive flocking can emerge in open space through a multi-agent reinforcement learning framework that uses Voronoi topological neighbours and a local cost based on to balance alignment and attraction. The learned policy exhibits Vicsek-like dynamics with high polar order, featuring a two-regime behavior: strong alignment at short distances and a flexible mix of alignment and attraction at larger separations, yielding starling-like nonequilibrium liquid structure. The work demonstrates robustness to training scheme (CT vs DT) and to cost-function details, and it shows that short-range repulsion is essential for flocking; results have potential implications for understanding animal behavior and guiding swarm robotics in open environments. Overall, the study provides a concrete mechanism by which staying together while avoiding collisions can naturally give rise to cohesive collective motion in active matter systems.

Abstract

We investigate the emergence of cohesive flocking in open, boundless space using a multi-agent reinforcement learning framework. Agents integrate positional and orientational information from their closest topological neighbours and learn to balance alignment and attractive interactions by optimizing a local cost function that penalizes both excessive separation and close-range crowding. The resulting Vicsek-like dynamics is robust to algorithmic implementation details and yields cohesive collective motion with high polar order. The optimal policy is dominated by strong aligning interactions when agents are sufficiently close to their neighbours, and a flexible combination of alignment and attraction at larger separations. We further characterize the internal structure and dynamics of the resulting groups using liquid-state metrics and neighbour exchange rates, finding qualitative agreement with empirical observations in starling flocks. These results suggest that flocking may emerge in groups of moving agents as an adaptive response to the biological imperatives of staying together while avoiding collisions.

Paper Structure

This paper contains 13 sections, 18 equations, 8 figures.

Figures (8)

  • Figure 1: (a) Cartoon describing the orientational dynamics of agents. At each step the $i$-th agent (blue arrow) chooses the new direction given the information it perceives from its topological neighbours (red arrows) and linearly weights the alignment to the neighbours' mean direction $V_i^t$ and to the direction $R_i^t$ to the local center of mass. Nearest neighbours are evaluated via Voronoi tessellation. (b) The cost function \ref{['L_x_eq']} with parameters $a=1$ and $b=2$.
  • Figure 2: Mean distance to the centre of mass (MDCM) and order parameter (OP) for both centralized (CT) (blue, $n_e=300$) and decentralized (DT) training (red, $n_e=1000$). (a) MDCM in the last training episode for a group of $N=100$ agents. (b) OP in the last training episode ($N=100$). Insets in (a) and (b) are averages over single episodes. (c) Finite size scaling of the average OP. Inset: Finite size scaling of the average MDCM, the dashed black line marks the power law $\sim \sqrt{N}$. (d) OP vs. noise amplitude for $N=100$. In panels (c)-(d) open red symbols refer to DT with $\nu=\omega=0.7$, while full red symbols to DT with $\nu=0.5$ and $\omega=0.99$.
  • Figure 3: (a) Radial pair distribution function measured for a single configuration in a system of $N=400$ agents trained in the CT regime. Here $\Delta r = 0.05$. (b) Probability distribution of the distance of the nearest neighbours obtained from the last episode of the same simulation.
  • Figure 4: (a) Trained $Q$-matrix of the agents for CT after $n_e=300$ episodes. Dots mark the state-action pairs that minimize the matrix for each state. For better visualization, we show the (state-by-state) normalized matrix $\tilde{Q}(d,\beta)\equiv [Q(d,\beta)-Q^m(d)]/[Q^M(d)-Q^m(d)] \in [0,1]$, where $Q^m(d)=\hbox{min}_\beta\{Q(d,\beta)\}$ and $Q^M(d)=\hbox{max}_\beta\{Q(d,\beta)\}$ are respectively the minimum and the maximum values for the state $d$. (b) Final frequency matrix $\Omega$ of the state-action pairs for CT. (c) Trained $Q$-matrix $\langle Q_i \rangle_i$, averaged over all agents for DT after $n_e=1000$ episodes, normalized as in (a). (d) Final total frequency matrix $\sum_i \Omega_i$ for DT. In panels (a) and (c) the optimal policies (see Eq. \ref{['optimal']}) are marked by orange dots. $N=100$ in all panels.
  • Figure 5: Representation of the average $Q$-matrix $\bar{Q}=\langle Q_i\rangle_i$ trained in the DT scheme for $N=100$ agents. Each curve, labeled by an integer $n$ refers to a specific state $d=n\Delta d$ and shows $\bar{Q}(d, \beta)$ as a function of $\beta$. Error bars represent one standard error. (a) First state, $d<v0$. (b) States with $d<d^*$. (c) States with $d>d^*$. Note that the ratio between the vertical scale of the three panels is 1:2:40.
  • ...and 3 more figures