AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations
Zonglin Meng, Yun Zhang, Zhaoliang Zheng, Zhihao Zhao, Jiaqi Ma
TL;DR
AgentAlign tackles the fragility of inter-agent sensor correlations in real-world cooperative perception by introducing a cross-modality feature alignment space (CFAS) and a heterogeneous agent feature alignment (HAFA) mechanism to adaptively harmonize multi-modal features across diverse V2X agents. The framework densifies LiDAR-Camera interactions through a depth variation map and attention-guided fusion, enabling resilient inter-agent sensing under multifactorial noise. A novel V2XSet-Noise dataset is built to systematically evaluate robustness to calibration errors, wind-induced vibration, perspective distortions, time synchronization, and systematic biases. Empirical results on V2X-Real and V2XSet-Noise benchmarks demonstrate state-of-the-art performance and strong robustness, supported by extensive ablations that isolate the contribution of CFAS, HAFA, and depth-based alignment. The work advances practical cooperative perception for autonomous infrastructure and vehicles and provides a controllable evaluation platform for real-world sensor imperfections.
Abstract
Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructures to address sensing occlusion and range limitation issues. However, existing research overlooks the fragile multi-sensor correlations in multi-agent settings, as the heterogeneous agent sensor measurements are highly susceptible to environmental factors, leading to weakened inter-agent sensor interactions. The varying operational conditions and other real-world factors inevitably introduce multifactorial noise and consequentially lead to multi-sensor misalignment, making the deployment of multi-agent multi-modality perception particularly challenging in the real world. In this paper, we propose AgentAlign, a real-world heterogeneous agent cross-modality feature alignment framework, to effectively address these multi-modality misalignment issues. Our method introduces a cross-modality feature alignment space (CFAS) and heterogeneous agent feature alignment (HAFA) mechanism to harmonize multi-modality features across various agents dynamically. Additionally, we present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions, facilitating a systematic evaluation of our approach's robustness. Extensive experiments on the V2X-Real and V2XSet-Noise benchmarks demonstrate that our framework achieves state-of-the-art performance, underscoring its potential for real-world applications in cooperative autonomous driving. The controllable V2XSet-Noise dataset and generation pipeline will be released in the future.
