Table of Contents
Fetching ...

CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus

Yunjiang Xu, Lingzhi Li, Jin Wang, Benyuan Yang, Zhiwen Wu, Xinhong Chen, Jianping Wang

TL;DR

CoDynTrust tackles temporal asynchrony in multi-agent perception by introducing Dynamic Feature Trust Modulus (DFTM) to quantify ROI reliability under both aleatoric and epistemic uncertainty, paired with a linear two-frame motion extrapolation and a multi-scale hybrid fusion. The approach separates uncertainty-aware ROI generation, BEV flow-based motion compensation, and adaptive fusion to fuse sparse features across agents, enabling uncertainty to propagate to planning and control. Experimental results on DAIR-V2X, V2XSet, OPV2V, and Culver City Digital Twin demonstrate robust performance under delays and pose noise, achieving state-of-the-art AP at IoU thresholds and improved resilience to asynchronous data. This work enhances safety in autonomous driving by providing a principled, uncertainty-aware framework for asynchronous collaborative perception with practical communication efficiency.

Abstract

Collaborative perception, fusing information from multiple agents, can extend perception range so as to improve perception performance. However, temporal asynchrony in real-world environments, caused by communication delays, clock misalignment, or sampling configuration differences, can lead to information mismatches. If this is not well handled, then the collaborative performance is patchy, and what's worse safety accidents may occur. To tackle this challenge, we propose CoDynTrust, an uncertainty-encoded asynchronous fusion perception framework that is robust to the information mismatches caused by temporal asynchrony. CoDynTrust generates dynamic feature trust modulus (DFTM) for each region of interest by modeling aleatoric and epistemic uncertainty as well as selectively suppressing or retaining single-vehicle features, thereby mitigating information mismatches. We then design a multi-scale fusion module to handle multi-scale feature maps processed by DFTM. Compared to existing works that also consider asynchronous collaborative perception, CoDynTrust combats various low-quality information in temporally asynchronous scenarios and allows uncertainty to be propagated to downstream tasks such as planning and control. Experimental results demonstrate that CoDynTrust significantly reduces performance degradation caused by temporal asynchrony across multiple datasets, achieving state-of-the-art detection performance even with temporal asynchrony. The code is available at https://github.com/CrazyShout/CoDynTrust.

CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus

TL;DR

CoDynTrust tackles temporal asynchrony in multi-agent perception by introducing Dynamic Feature Trust Modulus (DFTM) to quantify ROI reliability under both aleatoric and epistemic uncertainty, paired with a linear two-frame motion extrapolation and a multi-scale hybrid fusion. The approach separates uncertainty-aware ROI generation, BEV flow-based motion compensation, and adaptive fusion to fuse sparse features across agents, enabling uncertainty to propagate to planning and control. Experimental results on DAIR-V2X, V2XSet, OPV2V, and Culver City Digital Twin demonstrate robust performance under delays and pose noise, achieving state-of-the-art AP at IoU thresholds and improved resilience to asynchronous data. This work enhances safety in autonomous driving by providing a principled, uncertainty-aware framework for asynchronous collaborative perception with practical communication efficiency.

Abstract

Collaborative perception, fusing information from multiple agents, can extend perception range so as to improve perception performance. However, temporal asynchrony in real-world environments, caused by communication delays, clock misalignment, or sampling configuration differences, can lead to information mismatches. If this is not well handled, then the collaborative performance is patchy, and what's worse safety accidents may occur. To tackle this challenge, we propose CoDynTrust, an uncertainty-encoded asynchronous fusion perception framework that is robust to the information mismatches caused by temporal asynchrony. CoDynTrust generates dynamic feature trust modulus (DFTM) for each region of interest by modeling aleatoric and epistemic uncertainty as well as selectively suppressing or retaining single-vehicle features, thereby mitigating information mismatches. We then design a multi-scale fusion module to handle multi-scale feature maps processed by DFTM. Compared to existing works that also consider asynchronous collaborative perception, CoDynTrust combats various low-quality information in temporally asynchronous scenarios and allows uncertainty to be propagated to downstream tasks such as planning and control. Experimental results demonstrate that CoDynTrust significantly reduces performance degradation caused by temporal asynchrony across multiple datasets, achieving state-of-the-art detection performance even with temporal asynchrony. The code is available at https://github.com/CrazyShout/CoDynTrust.

Paper Structure

This paper contains 19 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Based on DFTM, CoDynTrust can mitigate error propagation amplified by temporal asynchrony, effectively suppressing low-quality information and enhancing detection robustness.
  • Figure 2: System overview. The message packing process prepares ROI, uncertainty, and sparse features for efficient communication and BEV flow map generation. Message fusion generate Dynamic Feature Trust Modulus and scatters it back to the sparse feature map, while the BEV map is generated and used for motion compensation. Finally, multi-scale Hybrid Fusion is applied to fuse feature maps from all agents.
  • Figure 3: Overall Structure of Hybrid Fusion.
  • Figure 4: Trade-off between detection performance (AP@0.7) and communication bandwidth under expected 300ms delay on DAIR-V2X (left) and V2XSet (right) datasets.
  • Figure 5: Visualization of CoBEVFlow and CoDynTrust detection results on V2XSet, with an expectation of a 300ms time interval. CoDynTrust shows better detection quality compared to CoBEVFlow. Red boxes indicate detection results, while green boxes represent ground truth.