Table of Contents
Fetching ...

Harnessing swarms for directed migration of interacting active particles via optimal global control

Chiara Calascibetta, Laëtitia Giraldi, Jérémie Bec

TL;DR

Problem: directed transport of swarms in narrow channels is hindered by wall accumulation and band formation. Approach: a minimal lattice model with global uniform actions is optimized using PPO to maximize net forward flux, compared to a simple rule based on orientation. Findings: learned policies converge to near-deterministic rules; in extreme regimes they align with a simple threshold; in the transitional regime they outperform the heuristic by reducing negative events and enabling more mobile patterns. Significance: demonstrates scalable global-control strategies for guiding swarms in confined geometries and highlights when disruption-based or alignment-based controls are advantageous, with potential applications in microfluidics and microrobotics.

Abstract

This study investigates the use of global control strategies to enhance the directed migration of swarms of interacting self-propelled particles confined in a channel. Uncontrolled dynamics naturally leads to wall accumulation, clogging, and band formation due to the interplay between self-organization and confinement. This work explores whether a uniform global control, such as magnetic field acting on all particles, can optimize collective transport. Using a discrete Vicsek-like model, it is found that simple global alignment controls, optimized via reinforcement learning, efficiently suppress unfavorable configurations and significantly increase the net particle flux along a prescribed channel direction. These results highlight that coarse, system-level observations are sufficient to achieve near-optimal control, even in regimes with strong fluctuations or partial ordering.

Harnessing swarms for directed migration of interacting active particles via optimal global control

TL;DR

Problem: directed transport of swarms in narrow channels is hindered by wall accumulation and band formation. Approach: a minimal lattice model with global uniform actions is optimized using PPO to maximize net forward flux, compared to a simple rule based on orientation. Findings: learned policies converge to near-deterministic rules; in extreme regimes they align with a simple threshold; in the transitional regime they outperform the heuristic by reducing negative events and enabling more mobile patterns. Significance: demonstrates scalable global-control strategies for guiding swarms in confined geometries and highlights when disruption-based or alignment-based controls are advantageous, with potential applications in microfluidics and microrobotics.

Abstract

This study investigates the use of global control strategies to enhance the directed migration of swarms of interacting self-propelled particles confined in a channel. Uncontrolled dynamics naturally leads to wall accumulation, clogging, and band formation due to the interplay between self-organization and confinement. This work explores whether a uniform global control, such as magnetic field acting on all particles, can optimize collective transport. Using a discrete Vicsek-like model, it is found that simple global alignment controls, optimized via reinforcement learning, efficiently suppress unfavorable configurations and significantly increase the net particle flux along a prescribed channel direction. These results highlight that coarse, system-level observations are sufficient to achieve near-optimal control, even in regimes with strong fluctuations or partial ordering.

Paper Structure

This paper contains 4 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison of swarm statistics without control and under the heuristic policy.(a) Global order parameter $\langle|\boldsymbol{\Pi}|\rangle$ as a function of the alignment strength $g$ for the uncontrolled system ($\bullet$) and under the heuristic control ($\blacksquare$). Inset: averaged flux magnitude, $\langle | \Phi|\rangle$, for the uncontrolled system and averaged flux, $\langle \Phi\rangle$, under control. (b) Probability density function (PDF) of the mean $x$-component of the orientation $\langle \Pi_x \rangle$, for the uncontrolled system at $g\simeq g_{\rm o}^\star$ ($\circ$) and $g > g_{\rm b}^\star$ ($\bullet$), and under the heuristic control at $g\simeq g_{\rm o}^\star$ ($\blacksquare$) and at $g > g_{\rm b}^\star$ ($\square$).
  • Figure 2: Disordered ($g\simeq g_{\rm o}^\star$) and fully ordered ($g >g_{\rm b}^\star$) regimes.(a)Top: PPO policy map in the $(\Pi_x,\Pi_y)$ plane for $g = 1.2$ and $g = 2.1$. Colors indicate action probabilities: red = vertical ($\boldsymbol{\updownarrow}$), blue = horizontal ($\boldsymbol{\leftrightarrow}$), white = no action ($\varnothing$), with intermediate shades denoting stochastic mixtures. Bottom: learning curves, showing the streamwise fluxes concatenated across training episodes. Black dashed lines mark the performance of the heuristic policy. (b-c) Probability density functions of the streamwise flux $\Phi$ at $g\simeq g_{\rm o}^\star$ (b) and $g>g_{\rm b}^\star$ (c), comparing the uncontrolled system ($\bullet$), the heuristic policy ($\blacksquare$), and PPO ($\blacklozenge$).
  • Figure 3: Transitional regime ($g_{\rm o}^\star<g<g_{\rm b}^\star$).(a)Top: PPO policy map in the $(\Pi_x,\Pi_y)$ plane for $g = 1.6$, using the same color code as Fig. \ref{['fig:extremes']}a. Bottom: associated learning curve, with the heuristic performance shown as a dashed line. (b) PDFs of the flux $\Phi$, comparing the uncontrolled system ($\bullet$), the heuristic policy ($\blacksquare$), and the policy learned from PPO ($\blacklozenge$). (c)Top: time series of $\Phi$ for heuristic ($\blacksquare$) and PPO ($\blacklozenge$) starting from the same initial condition. Bottom: representative snapshots at $t\simeq 15$, showing particle orientations.
  • Figure 4: Dependence of mean streamwise flux on the control frequency $\omega_{\rm c}$. Results are shown for the heuristic policy $\pi_{\rm heur}$ in the the transitional regime $g_{\rm o}^\star< g <g_{\rm b}^\star$ ($\blacksquare$) and the ordered regime $g > g_{\rm b}^\star$ ($\bullet$), and for the disruption-only policy $\pi_{\rm disr}$ ($\blacksquare$ and $\bullet$). The vertical dashed line marks the reference control frequency used throughout the manuscript, $\omega_{\rm c} = h\lambda_{\rm S}/L_x$.