Table of Contents
Fetching ...

Sparse shepherding control of large-scale multi-agent systems via Reinforcement Learning

Luigi Catello, Italo Napolitano, Davide Salzano, Mario di Bernardo

TL;DR

This work tackles sparse indirect control of large multi-agent systems by coupling a micro-level ODE model for a few controlled herders with a macro-level mean-field PDE for the target density, all driven by a PPO-trained macro–micro controller. The method introduces a steady-state target density estimator and an adaptive interaction strength law $K(t)$ to enhance performance under actuation sparsity, enabling effective steering of the target density toward a von Mises distribution characterized by concentration $\kappa$. Key contributions include the coupled ODE–PDE formulation in a periodic domain, a practical steady-state density estimation, an adaptive gain mechanism, and extensive numerical validation showing robust density control with low control effort and reduced need for online optimization. The approach offers a scalable, real-time pathway for sparse shepherding in large populations and suggests avenues for decentralization and real-world robot swarm deployment.

Abstract

We propose a reinforcement learning framework for sparse indirect control of large-scale multi-agent systems, where few controlled agents shape the collective behavior of many uncontrolled agents. The approach addresses this multi-scale challenge by coupling ODEs (modeling controlled agents) with a PDE (describing the uncontrolled population density), capturing how microscopic control achieves macroscopic objectives. Our method combines model-free reinforcement learning with adaptive interaction strength compensation to overcome sparse actuation limitations. Numerical validation demonstrates effective density control, with the system achieving target distributions while maintaining robustness to disturbances and measurement noise, confirming that learning-based sparse control can replace computationally expensive online optimization.

Sparse shepherding control of large-scale multi-agent systems via Reinforcement Learning

TL;DR

This work tackles sparse indirect control of large multi-agent systems by coupling a micro-level ODE model for a few controlled herders with a macro-level mean-field PDE for the target density, all driven by a PPO-trained macro–micro controller. The method introduces a steady-state target density estimator and an adaptive interaction strength law to enhance performance under actuation sparsity, enabling effective steering of the target density toward a von Mises distribution characterized by concentration . Key contributions include the coupled ODE–PDE formulation in a periodic domain, a practical steady-state density estimation, an adaptive gain mechanism, and extensive numerical validation showing robust density control with low control effort and reduced need for online optimization. The approach offers a scalable, real-time pathway for sparse shepherding in large populations and suggests avenues for decentralization and real-world robot swarm deployment.

Abstract

We propose a reinforcement learning framework for sparse indirect control of large-scale multi-agent systems, where few controlled agents shape the collective behavior of many uncontrolled agents. The approach addresses this multi-scale challenge by coupling ODEs (modeling controlled agents) with a PDE (describing the uncontrolled population density), capturing how microscopic control achieves macroscopic objectives. Our method combines model-free reinforcement learning with adaptive interaction strength compensation to overcome sparse actuation limitations. Numerical validation demonstrates effective density control, with the system achieving target distributions while maintaining robustness to disturbances and measurement noise, confirming that learning-based sparse control can replace computationally expensive online optimization.

Paper Structure

This paper contains 11 sections, 16 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Proposed control architecture. The macro-micro controller computes the control actions $\mathbf{u} (t)$ knowing the desired target distribution $\mkern 1.5mu\overline{\mkern-1.5mu\rho\mkern-1.5mu}\mkern 1.5mu^\mathrm{T}(x)$ and sensing the current herders' position $\mathbf{H} (t)$. The herders influence the target distribution $\rho^\mathrm{T} (x,t)$ through the velocity field $V(x,t)$.
  • Figure 2: Episodic reward during the training process of the PPO agent. Values are smoothed with a moving average of width 20 steps.
  • Figure 3: Performance of the learned policy using the reward function in \ref{['eq:reward_rhoTss']} and the compensation law in \ref{['eq:adaptive_K']}. (a) $\mathcal{L}^2$ norm of the target distribution error $e^\mathrm{T}$ (red) and Euclidean vector norm of the control effort $\left\lVert \mathbf{u} \right\rVert_2$ (blue) over time; (b) Top panel: Desired target density distribution $\mkern 1.5mu\overline{\mkern-1.5mu\rho\mkern-1.5mu}\mkern 1.5mu^\mathrm{T}$ in space. Bottom panel: Evolution of the targets density and of the herders positions in space (x axis) and time (y axis) for a representative experiment. The shade of magenta is representative of the density of the targets. The dark and light blue lines describe the position of the herders at each time instant; (c) boxplots of the settling time (2%) for the target population (red, left) and the herders (blue, right), calculated over 10 experiments with random initial conditions; and (d) boxplots of the steady-state target density error $e^\mathrm{T}_{\mathrm{ss}}$ (red, left) and the control effort $\mathrm u_m$ (blue, right), calculated over 10 experiments with random initial conditions.
  • Figure 4: Evolution of the $\left\lVert e^\mathrm{T}(x,t) \right\rVert_{\mathcal{L}^2}$ when activating the compensation law for gain $K (t)$ for $t \ge 75$ (indicated as a vertical black dotted line). (a) $\mathcal{L}_2$ norm of $e^\mathrm{T}$ with (solid line) and without (dotted line) compensation, highlighting the advantages of the compensation law. (b) time evolution of the interaction strength gain $K (t)$.
  • Figure 5: Robustness analysis of the proposed control strategy against constant disturbances (panels a, b) and measurement noise (panels c, d). (a) Steady-state error $e^\mathrm{T}$$^{,ss}$ varying the constant disturbance on the herders dynamics. (b) Top panel: Desired target density distribution $\mkern 1.5mu\overline{\mkern-1.5mu\rho\mkern-1.5mu}\mkern 1.5mu^\mathrm{T}$ in space. Bottom panel: Evolution of the targets density and of the herders positions in space (x axis) and time (y axis) for a representative experiment given $v_d=0.6$. The shade of magenta is representative of the density of the targets. The dark and light blue lines describe the position of the herders at each time instant. (c) Mean (solid blue line), 10th and 90th percentiles (shaded blue) of the steady state target error varying the amplitude of the measurement noise over 100 numerical experiments. (d) Top panel: Desired target density distribution $\mkern 1.5mu\overline{\mkern-1.5mu\rho\mkern-1.5mu}\mkern 1.5mu^\mathrm{T}$ in space. Bottom panel: Evolution of the targets density and of the herders positions in space (x axis) and time (y axis) for a representative example given $D_m=2\pi/5$. The shade of magenta is representative of the density of the targets. The dark and light blue lines describe the position of the herders at each time instant.