Robust Collaborative Perception without External Localization and Clock Devices
Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang
TL;DR
This work addresses the vulnerability of collaborative perception systems to localization errors and clock deviations by proposing FreeAlign, a GNSS- and clock-free spatial-temporal alignment module. FreeAlign leverages invariant geometric relations through Salient-Object Graph Learning with EdgeGAT, Mass-based Multi-Anchor Subgraph Searching, and a robust transformation calculation to estimate relative pose and timing between agents. The approach is validated on simulated OPV2V and real DAIR-V2X LiDAR datasets, demonstrating robustness to pose noise and latency deviations and improving performance over device-dependent baselines. The findings suggest FreeAlign enables dependable multi-agent perception in environments where external localization and synchronized clocks are unreliable or unavailable, with seamless integration into existing perception pipelines.
Abstract
A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.
