Is Discretization Fusion All You Need for Collaborative Perception?

Kang Yang; Tianci Bu; Lantao Li; Chunxu Li; Yongcai Wang; Deying Li

Is Discretization Fusion All You Need for Collaborative Perception?

Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, Deying Li

TL;DR

This work tackles the inefficiency of discretization-based fusion in collaborative perception by introducing ACCO, an anchor-centric framework that uses DETR-like anchor queries for cross-agent fusion. ACCO comprises an Anchor Featuring Block, an Anchor Confidence Generator, and a local-global fusion module with spatially aware attention mechanisms, enabling iterative refinement across multiple layers. Empirical results on OPV2V and Dair-V2X show ACCO reduces communication while expanding detection range and improving AP, particularly at long ranges. The approach demonstrates stronger long-range perception and robustness to occlusion, indicating practical benefits for multi-agent autonomous systems.

Abstract

Collaborative perception in multi-agent system enhances overall perceptual capabilities by facilitating the exchange of complementary information among agents. Current mainstream collaborative perception methods rely on discretized feature maps to conduct fusion, which however, lacks flexibility in extracting and transmitting the informative features and can hardly focus on the informative features during fusion. To address these problems, this paper proposes a novel Anchor-Centric paradigm for Collaborative Object detection (ACCO). It avoids grid precision issues and allows more flexible and efficient anchor-centric communication and fusion. ACCO is composed by three main components: (1) Anchor featuring block (AFB) that targets to generate anchor proposals and projects prepared anchor queries to image features. (2) Anchor confidence generator (ACG) is designed to minimize communication by selecting only the features in the confident anchors to transmit. (3) A local-global fusion module, in which local fusion is anchor alignment-based fusion (LAAF) and global fusion is conducted by spatial-aware cross-attention (SACA). LAAF and SACA run in multi-layers, so agents conduct anchor-centric fusion iteratively to adjust the anchor proposals. Comprehensive experiments are conducted to evaluate ACCO on OPV2V and Dair-V2X datasets, which demonstrate ACCO's superiority in reducing the communication volume, and in improving the perception range and detection performances. Code can be found at: \href{https://github.com/sidiangongyuan/ACCO}{https://github.com/sidiangongyuan/ACCO}.

Is Discretization Fusion All You Need for Collaborative Perception?

TL;DR

Abstract

Is Discretization Fusion All You Need for Collaborative Perception?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)