What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

Wanfang Su; Lixing Chen; Yang Bai; Xi Lin; Gaolei Li; Zhe Qu; Pan Zhou

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou

TL;DR

This paper proposes a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaborative encoder for voxel-level feature fusion.

Abstract

Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which were treated as an opaque procedure by most existing works. We propose a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) for intermediate collaboration. The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks. In particular, we define multi-view mutual information (MVMI) for intermediate collaboration that evaluates correlations between collaborative views and individual views on both global and local scales. We establish CMiMNet based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaboration encoder for voxel-level feature fusion. We evaluate CMiMC on V2X-Sim 1.0, and it improves the SOTA average precision by 3.08% and 4.44% at 0.5 and 0.7 IoU (Intersection-over-Union) thresholds, respectively. In addition, CMiMC can reduce communication volume to 1/32 while achieving performance comparable to SOTA. Code and Appendix are released at https://github.com/77SWF/CMiMC.

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

TL;DR

Abstract

Paper Structure (26 sections, 6 equations, 7 figures, 1 table)

This paper contains 26 sections, 6 equations, 7 figures, 1 table.

Introduction
Related Work
Collaborative Perception
Mutual Information Estimation and Maximization
Methods
The CMiMC Framework
Feature Encoder
Collaboration Encoder
Decoder-Header Module
Multi-view Mutual Information Maximization for Intermediate Collaboration
CMiMNet
MVMI Maximization via Contrastive Learning
Positive and Negative Pairs for CMiMNet
Global and Local MVMI Estimator
Loss Function
...and 11 more sections

Figures (7)

Figure 1: Scenario of intermediate collaboration for LiDAR-based 3D object detection and the framework of CMiMC.
Figure 2: Contrastive learning over positive/negative pairs.
Figure 3: The structure of CMiMNet for estimating and maximizing MVMI.
Figure 4: Perfomance-bandwidth trade-off of CMiMC.
Figure 5: No Collaboration v.s. CMiMC v.s. Early Collaboration. Green/Red boxes are ground-truth/predicted boxes.
...and 2 more figures

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

TL;DR

Abstract

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

Authors

TL;DR

Abstract

Table of Contents

Figures (7)