LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

Saurav Agarwal; Ramya Muthukrishnan; Walker Gosrich; Vijay Kumar; Alejandro Ribeiro

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

Saurav Agarwal, Ramya Muthukrishnan, Walker Gosrich, Vijay Kumar, Alejandro Ribeiro

TL;DR

The paper tackles decentralized coverage control for robot swarms operating in environments with an unknown, static importance density field Φ. It introduces LPAC, a Learnable Perception-Action-Communication loop that combines a CNN-based perception module, a Graph Neural Network-based communication module, and a shallow MLP action module, trained via imitation learning from a clairvoyant CVT planner. Results show LPAC outperforms both centralized and decentralized CVT baselines, generalizes to larger numbers of robots and features, transfers to larger environments without retraining, and remains robust to position noise, real-world data, and sim-to-real demonstrations. These findings support the viability of learnable PAC architectures for scalable, robust, and transferable decentralized navigation in robot swarms, with potential applicability to other multi-agent reasoning tasks.

Abstract

Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolutional neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

TL;DR

Abstract

Paper Structure (28 sections, 21 equations, 20 figures)

This paper contains 28 sections, 21 equations, 20 figures.

Introduction
Related Work
Graph Neural Networks for Navigation Problems
Multi-Agent Reinforcement Learning (MARL)
Coverage Control in Multi-Robot Systems
Problem Statement
Decentralized Navigation Control
Coverage Control Problem
Learnable PAC Architecture
Perception Module
Communication with Graph Neural Networks
Action Module
Environment and Imitation Learning
Centroidal Voronoi Tessellation
Coverage Control Environment
...and 13 more sections

Figures (20)

Figure 1: The proposed learnable Perception-Action-Communication (LPAC) architecture for decentralized navigation of robot swarms: (1) In the perception module, a convolutional neural network (CNN) processes maps representing localized observations and generates an abstract representation. (2) In the communication module, a graph neural network (GNN) performs computations on the output of the perception module and the messages received from neighboring robots. It generates a fixed-size message to share with nearby robots and aggregates the received information to generate a feature vector for the action module of the robot. (3) In the action module, a shallow multilayer perceptron (MLP) computes the control actions for the robot based on the output generated by the GNN. The three modules are executed on each robot independently, with the GNN in the communication module facilitating collaboration between robots.
Figure 2: A near-optimal solution to the coverage control problem: A team of 32 robots is deployed in an environment of size 1024m$\times$1024m. There are 32 features represented as Gaussians to represent the importance density field (IDF). Robots position themselves to provide sensor coverage to the features of interest. The green lines represent the Voronoi partition of the environment with respect to the robot positions. A robot is closer to all points in its Voronoi region than any other robot.
Figure 3: The four channels of the CNN input image in the perception module. All channels are ego-centric to the robot and are of size 32$\times$32. The first channel (a) represents the IDF observed by the robot in its local vicinity. The second channel (b) represents the boundary of the environment. They have non-zero values only when the robot is close to the boundary. The third (c) and fourth (d) channels represent the positions of the neighbors of the robot. For each neighbor, the pixels in the channels corresponding to the relative position of the neighbor have non-zero values. The channel (c) represents the $x$-coordinates of the neighbors, and the other channel (d) represents the $y$-coordinates, both normalized by the communication range.
Figure 4: An example of the GNN architecture used in the communication module. The architecture is composed of $L=3$ layers of graph convolution filters (red boxes) followed by pointwise nonlinearities (blue boxes).
Figure 5: Distributed implementation of the GNN architecture for a robot $i$. (a) The communication graph highlights the neighboring robots, i.e., $\mathcal{N}(i) = \{1,2,3\}$. The robot $i$ receives aggregated messages $\mathbf{Y}_j=\{(\mathbf{y}_j)_{lk}\}, \forall j\in\{1,2,3\}$ from the neighboring robots. (b) For a layer $l$ of the GNN, the output of the previous layer $(\mathbf{x}_i)_{l-1}$ and the aggregated messages are processed by the graph convolution filter to generate the output $(\mathbf{z}_i)_l$, which is then processed by the pointwise nonlinearity to generate the output $(\mathbf{x}_i)_l$. The convolution filter also generates the aggregated message $\mathbf{Y}_i$ to be sent to the neighboring robots.
...and 15 more figures

Theorems & Definitions (1)

Remark

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

TL;DR

Abstract

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

Authors

TL;DR

Abstract

Table of Contents

Figures (20)

Theorems & Definitions (1)