Table of Contents
Fetching ...

Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction

Kai Li, Zhao Ma, Liang Li, Shiyu Zhao

TL;DR

The paper addresses the challenge of learning both local interaction mechanisms and the collective control policy of a swarm from trajectory data. It introduces CBC, which combines a Graph Variational Autoencoder (GVAE) to infer a time-varying interaction graph with behavioral cloning to learn the policy, complemented by a vision-based neighbor-selection module for decentralized operation. Key contributions include an enhanced GVAE that outperforms baselines in graph prediction and a real-world demonstration on a decentralized, vision-based robot swarm with no inter-robot communication, achieving lower action and trajectory errors. The results validate CBC as a practical framework for understanding swarm dynamics and enabling robust decentralized swarm robotics applications.

Abstract

In this paper, we propose a framework, collective behavioral cloning (CBC), to learn the underlying interaction mechanism and control policy of a swarm system. Given the trajectory data of a swarm system, we propose a graph variational autoencoder (GVAE) to learn the local interaction graph. Based on the interaction graph and swarm trajectory, we use behavioral cloning to learn the control policy of the swarm system. To demonstrate the practicality of CBC, we deploy it on a real-world decentralized vision-based robot swarm system. A visual attention network is trained based on the learned interaction graph for online neighbor selection. Experimental results show that our method outperforms previous approaches in predicting both the interaction graph and swarm actions with higher accuracy. This work offers a promising approach for understanding interaction mechanisms and swarm dynamics in future swarm robotics research. Code and data are available.

Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction

TL;DR

The paper addresses the challenge of learning both local interaction mechanisms and the collective control policy of a swarm from trajectory data. It introduces CBC, which combines a Graph Variational Autoencoder (GVAE) to infer a time-varying interaction graph with behavioral cloning to learn the policy, complemented by a vision-based neighbor-selection module for decentralized operation. Key contributions include an enhanced GVAE that outperforms baselines in graph prediction and a real-world demonstration on a decentralized, vision-based robot swarm with no inter-robot communication, achieving lower action and trajectory errors. The results validate CBC as a practical framework for understanding swarm dynamics and enabling robust decentralized swarm robotics applications.

Abstract

In this paper, we propose a framework, collective behavioral cloning (CBC), to learn the underlying interaction mechanism and control policy of a swarm system. Given the trajectory data of a swarm system, we propose a graph variational autoencoder (GVAE) to learn the local interaction graph. Based on the interaction graph and swarm trajectory, we use behavioral cloning to learn the control policy of the swarm system. To demonstrate the practicality of CBC, we deploy it on a real-world decentralized vision-based robot swarm system. A visual attention network is trained based on the learned interaction graph for online neighbor selection. Experimental results show that our method outperforms previous approaches in predicting both the interaction graph and swarm actions with higher accuracy. This work offers a promising approach for understanding interaction mechanisms and swarm dynamics in future swarm robotics research. Code and data are available.

Paper Structure

This paper contains 11 sections, 16 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of the CBC framework. The GVAE learns the interaction graph from the trajectory and behavior cloning learns the control policy. Then collective behavior of the swarm can be replicated. Since we use a vision-based decentralized robot swarm system with no wireless communication, a visual attention network is trained based on the learned interaction graph for online interaction robots selection.
  • Figure 2: The GVAE model takes the swarm trajectory and outputs the local interaction graph edges.
  • Figure 3: Structure of the GVAE model. $\mathbf{X}$ and $\mathbf{\hat{X}}$ denotes the input and predicted trajectory respectively. $\mathbf{z}$ represents the interaction graph edge logits.
  • Figure 4: Architecture of the vision-based robot swarm. The visual attention network is trained using labels generated from the graph edge predictions by the GVAE. Since the system operates in a decentralized manner and wireless communication is not allowed, visual attention is leveraged to select robots for engagement.
  • Figure 5: The GVAE predicts the interaction graph. For robots that have interactions, we project their 3D position onto the image space to create training labels for YOLOv5 and visual attention network.
  • ...and 5 more figures