Table of Contents
Fetching ...

Self-Localized Collaborative Perception

Zhenyang Ni, Zixing Lei, Yifan Lu, Dingju Wang, Chen Feng, Yanfeng Wang, Siheng Chen

TL;DR

Self-localized collaborative perception addresses the vulnerability of external localization to noise and attacks by deriving relative poses from perception data alone. The authors introduce BEVGlue as a spatial alignment module and CoBEVGlue as an end-to-end self-localized collaboration system, with object-graph modeling and temporally consistent maximum common subgraph detection to fuse BEV features across agents. They demonstrate state-of-the-art robustness under localization noise and spoofing attacks on real and simulated datasets OPV2V, DAIR-V2X, and V2V4Real, and show BEVGlue can substantially boost other methods by about 57.7% with minimal bandwidth. The work advances practical multi-agent perception by delivering localization-free collaboration with high efficiency and accuracy.

Abstract

Collaborative perception has garnered considerable attention due to its capacity to address several inherent challenges in single-agent perception, including occlusion and out-of-range issues. However, existing collaborative perception systems heavily rely on precise localization systems to establish a consistent spatial coordinate system between agents. This reliance makes them susceptible to large pose errors or malicious attacks, resulting in substantial reductions in perception performance. To address this, we propose~$\mathtt{CoBEVGlue}$, a novel self-localized collaborative perception system, which achieves more holistic and robust collaboration without using an external localization system. The core of~$\mathtt{CoBEVGlue}$ is a novel spatial alignment module, which provides the relative poses between agents by effectively matching co-visible objects across agents. We validate our method on both real-world and simulated datasets. The results show that i) $\mathtt{CoBEVGlue}$ achieves state-of-the-art detection performance under arbitrary localization noises and attacks; and ii) the spatial alignment module can seamlessly integrate with a majority of previous methods, enhancing their performance by an average of $57.7\%$. Code is available at https://github.com/VincentNi0107/CoBEVGlue

Self-Localized Collaborative Perception

TL;DR

Self-localized collaborative perception addresses the vulnerability of external localization to noise and attacks by deriving relative poses from perception data alone. The authors introduce BEVGlue as a spatial alignment module and CoBEVGlue as an end-to-end self-localized collaboration system, with object-graph modeling and temporally consistent maximum common subgraph detection to fuse BEV features across agents. They demonstrate state-of-the-art robustness under localization noise and spoofing attacks on real and simulated datasets OPV2V, DAIR-V2X, and V2V4Real, and show BEVGlue can substantially boost other methods by about 57.7% with minimal bandwidth. The work advances practical multi-agent perception by delivering localization-free collaboration with high efficiency and accuracy.

Abstract

Collaborative perception has garnered considerable attention due to its capacity to address several inherent challenges in single-agent perception, including occlusion and out-of-range issues. However, existing collaborative perception systems heavily rely on precise localization systems to establish a consistent spatial coordinate system between agents. This reliance makes them susceptible to large pose errors or malicious attacks, resulting in substantial reductions in perception performance. To address this, we propose~, a novel self-localized collaborative perception system, which achieves more holistic and robust collaboration without using an external localization system. The core of~ is a novel spatial alignment module, which provides the relative poses between agents by effectively matching co-visible objects across agents. We validate our method on both real-world and simulated datasets. The results show that i) achieves state-of-the-art detection performance under arbitrary localization noises and attacks; and ii) the spatial alignment module can seamlessly integrate with a majority of previous methods, enhancing their performance by an average of . Code is available at https://github.com/VincentNi0107/CoBEVGlue
Paper Structure (17 sections, 4 equations, 6 figures, 3 tables)

This paper contains 17 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Noise is pervasive in external localization systems, leading to substantial reductions on the detection performance of collaborative perception systems. (a) and (b) show snapshots from the real-world collaborative perception dataset DAIR-V2X yu2022dair and V2V4Realxu2023v2v4real. Red denotes point cloud and ground truth bounding boxes from the ego agent and blue belongs to the collaborator. The point cloud and bounding boxes from collaborator are transformed to the coordinate system of ego agent with ground truth poses. Despite resource-intensive offline calibration efforts, the ground truth localization error persists at the meter-level. (c) shows that current collaborative perception systems fail to transcend the no collaboration baseline under large localization noise on DAIR-V2X. In comparison, our $\mathtt{CoBEVGlue}$ achieves state-of-art detection performance when localization noise exist, performing comparably to systems relying on precise localization information.
  • Figure 2: Overview of the proposed self-localized collaborative perception framework. The key module is $\mathtt{BEVGlue}$, which leverages object graphs and temporally consistent MCS detection to achieve spatial alignment.
  • Figure 3: Illustration of the proposed object graph modeling. The Ego agent and its collaborator is collaborating at a T-junction. Three co-visible object are connected by red line, representing their spatial relationship. The right portion visualizes the meaning of variables in the edge features.
  • Figure 4: Comparison of collaborative perception with our $\mathtt{BEVGlue}$ and using point cloud registration. $\mathtt{BEVGlue}$ shows significant advantages over baseline methods in both detection and communication efficiency, operating without the need for an initial pose. Initial poses provided to ICP are varied under different Gaussian noise. We sample feature points to conform within the predefined communication volume constraint.
  • Figure 5: CoBEVGlue qualitatively outperforms V2X-ViT xu2022v2x and CoAlign lu2023robust on OPV2V dataset under localization noisy setting. Green and red boxes denote ground-truth and detection, respectively.
  • ...and 1 more figures