Table of Contents
Fetching ...

Is Discretization Fusion All You Need for Collaborative Perception?

Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, Deying Li

TL;DR

This work tackles the inefficiency of discretization-based fusion in collaborative perception by introducing ACCO, an anchor-centric framework that uses DETR-like anchor queries for cross-agent fusion. ACCO comprises an Anchor Featuring Block, an Anchor Confidence Generator, and a local-global fusion module with spatially aware attention mechanisms, enabling iterative refinement across multiple layers. Empirical results on OPV2V and Dair-V2X show ACCO reduces communication while expanding detection range and improving AP, particularly at long ranges. The approach demonstrates stronger long-range perception and robustness to occlusion, indicating practical benefits for multi-agent autonomous systems.

Abstract

Collaborative perception in multi-agent system enhances overall perceptual capabilities by facilitating the exchange of complementary information among agents. Current mainstream collaborative perception methods rely on discretized feature maps to conduct fusion, which however, lacks flexibility in extracting and transmitting the informative features and can hardly focus on the informative features during fusion. To address these problems, this paper proposes a novel Anchor-Centric paradigm for Collaborative Object detection (ACCO). It avoids grid precision issues and allows more flexible and efficient anchor-centric communication and fusion. ACCO is composed by three main components: (1) Anchor featuring block (AFB) that targets to generate anchor proposals and projects prepared anchor queries to image features. (2) Anchor confidence generator (ACG) is designed to minimize communication by selecting only the features in the confident anchors to transmit. (3) A local-global fusion module, in which local fusion is anchor alignment-based fusion (LAAF) and global fusion is conducted by spatial-aware cross-attention (SACA). LAAF and SACA run in multi-layers, so agents conduct anchor-centric fusion iteratively to adjust the anchor proposals. Comprehensive experiments are conducted to evaluate ACCO on OPV2V and Dair-V2X datasets, which demonstrate ACCO's superiority in reducing the communication volume, and in improving the perception range and detection performances. Code can be found at: \href{https://github.com/sidiangongyuan/ACCO}{https://github.com/sidiangongyuan/ACCO}.

Is Discretization Fusion All You Need for Collaborative Perception?

TL;DR

This work tackles the inefficiency of discretization-based fusion in collaborative perception by introducing ACCO, an anchor-centric framework that uses DETR-like anchor queries for cross-agent fusion. ACCO comprises an Anchor Featuring Block, an Anchor Confidence Generator, and a local-global fusion module with spatially aware attention mechanisms, enabling iterative refinement across multiple layers. Empirical results on OPV2V and Dair-V2X show ACCO reduces communication while expanding detection range and improving AP, particularly at long ranges. The approach demonstrates stronger long-range perception and robustness to occlusion, indicating practical benefits for multi-agent autonomous systems.

Abstract

Collaborative perception in multi-agent system enhances overall perceptual capabilities by facilitating the exchange of complementary information among agents. Current mainstream collaborative perception methods rely on discretized feature maps to conduct fusion, which however, lacks flexibility in extracting and transmitting the informative features and can hardly focus on the informative features during fusion. To address these problems, this paper proposes a novel Anchor-Centric paradigm for Collaborative Object detection (ACCO). It avoids grid precision issues and allows more flexible and efficient anchor-centric communication and fusion. ACCO is composed by three main components: (1) Anchor featuring block (AFB) that targets to generate anchor proposals and projects prepared anchor queries to image features. (2) Anchor confidence generator (ACG) is designed to minimize communication by selecting only the features in the confident anchors to transmit. (3) A local-global fusion module, in which local fusion is anchor alignment-based fusion (LAAF) and global fusion is conducted by spatial-aware cross-attention (SACA). LAAF and SACA run in multi-layers, so agents conduct anchor-centric fusion iteratively to adjust the anchor proposals. Comprehensive experiments are conducted to evaluate ACCO on OPV2V and Dair-V2X datasets, which demonstrate ACCO's superiority in reducing the communication volume, and in improving the perception range and detection performances. Code can be found at: \href{https://github.com/sidiangongyuan/ACCO}{https://github.com/sidiangongyuan/ACCO}.

Paper Structure

This paper contains 14 sections, 15 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Fusion process in DF v.s. ACCO. ACCO is more flexible, accurate, and efficient.
  • Figure 2: Framework. The encoder layer of ACCO contains anchor queries, anchor featuring block, spatial-aware self-attention, and local-global fusion. Anchor queries are initialized as a sparse set of proposals in the BEV space. The spatial-aware self-attention encodes the queries with spatial distance. Local-global fusion is a critical component comprising several key elements: the anchor encoder, anchor confidence generator, local anchor alignment-based fusion, and spatial-aware cross-attention. The decoder repeats $L$ times to produce final predictions.
  • Figure 3: This visualization compares different methods applied to the OPV2V dataset. Green and red 3D bounding boxes represent the groun truth and prediction respectively. Blue 3D bounding boxes represent the communication agents.
  • Figure 4: Analysis of communication bandwidth across different perception distances.
  • Figure 5: The relationship between communication volume and performance demonstrated.