Table of Contents
Fetching ...

Robust Collaborative Perception without External Localization and Clock Devices

Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang

TL;DR

This work addresses the vulnerability of collaborative perception systems to localization errors and clock deviations by proposing FreeAlign, a GNSS- and clock-free spatial-temporal alignment module. FreeAlign leverages invariant geometric relations through Salient-Object Graph Learning with EdgeGAT, Mass-based Multi-Anchor Subgraph Searching, and a robust transformation calculation to estimate relative pose and timing between agents. The approach is validated on simulated OPV2V and real DAIR-V2X LiDAR datasets, demonstrating robustness to pose noise and latency deviations and improving performance over device-dependent baselines. The findings suggest FreeAlign enables dependable multi-agent perception in environments where external localization and synchronized clocks are unreliable or unavailable, with seamless integration into existing perception pipelines.

Abstract

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.

Robust Collaborative Perception without External Localization and Clock Devices

TL;DR

This work addresses the vulnerability of collaborative perception systems to localization errors and clock deviations by proposing FreeAlign, a GNSS- and clock-free spatial-temporal alignment module. FreeAlign leverages invariant geometric relations through Salient-Object Graph Learning with EdgeGAT, Mass-based Multi-Anchor Subgraph Searching, and a robust transformation calculation to estimate relative pose and timing between agents. The approach is validated on simulated OPV2V and real DAIR-V2X LiDAR datasets, demonstrating robustness to pose noise and latency deviations and improving performance over device-dependent baselines. The findings suggest FreeAlign enables dependable multi-agent perception in environments where external localization and synchronized clocks are unreliable or unavailable, with seamless integration into existing perception pipelines.

Abstract

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.
Paper Structure (14 sections, 3 equations, 5 figures, 6 tables)

This paper contains 14 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An illustration of collaborative perception in autonomous driving. The upper side shows in traditional approach, inaccurate localization and clock signal mislead the collaborative perception system. The lower side show the proposed FreeAlign enables robust collaborative perception that operates without external devices for localization or synchronized clock. The core of FreeAlign is to associate the same objects perceived by multiple agents based on similar geometric structures among those objects.
  • Figure 2: Overview of the proposed robust collaborative perception framework. The key module is FreeAlign, which leverages salient-object graphs to achieve spatial-temporal alignment.
  • Figure 3: Detection performance on OPV2V opv2v and DAIR-V2Xdairv2x datasets with pose noises following Gaussian distribution in the testing phases. The performance of baselines drops significantly as noise increases, while FreeAlign's result is stable.
  • Figure 4: FreeAlign qualitatively outperforms V2X-ViT and CoAlign on OPV2V dataset under pose noisy setting. Green and red boxes denote ground-truth and detection, respectively.
  • Figure 5: Visualization of collaboration between ego vehicle (a) and edge vehicle (b-c), and the spatial-temporal fusion result of them (d). FreeAlign achieves spatial-temporal alignment through a common subgraph between views of different times and locations. Green boxes denote ground-truth, red boxes denote detection, Yellow and blue denote the point clouds collected from ego and edge vehicles, respectively. White denotes the common subgraph.