Table of Contents
Fetching ...

Spatiotemporal Feature Alignment and Weighted Fusion in Collaborative Perception Enabled by Network Synchronization and Age of Information

Qiaomei Han, Xianbin Wang, Minghui Liwang, Dusit Niyato

TL;DR

The paper addresses the challenge of unreliable fusion in collaborative IoV perception caused by spatiotemporal misalignment from clock drift and varying delays. It proposes a framework combining network synchronization via two-way timestamp exchange and Kalman-based clock-state tracking with AoI to establish a common temporal reference, then performs spatiotemporal feature alignment and an uncertainty-aware, RoI-level weighted fusion. Key contributions include a robust synchronization scheme, a temporal compensator guided by AoI, and a reliability metric that jointly considers time-to-space uncertainty for fusion weighting, all trained end-to-end. Results in simulated clock drift and delay scenarios show consistent perception accuracy gains over strong baselines, with ablation studies confirming the value of synchronization and delay modeling. The approach enables fresh, reliable information to be fused efficiently, improving decision-making in IoV while adapting to dynamic network conditions.

Abstract

Collaborative perception in Internet of Vehicles (IoV) aggregates multi-vehicle observations for broader scene coverage and improved decision-making. However, fusion quality degrades under spatiotemporal heterogeneity from unsynchronized clocks, communication delays, and motion variations across vehicles. Prior work mitigates these through spatial transformations or fixed time-offset corrections, overlooking time-varying clock drifts and delays that cause persistent feature misalignment. To overcome these, we propose a spatiotemporal feature alignment and weighted fusion framework. Specifically, network synchronization is designed to continuously compensate for clock state differences between vehicles and establish a common time reference, onto which all feature timestamps can be mapped. After synchronization, to align the freshness of received features since their generation, their Age of Information (AoI) is determined by estimating network delay with given feature size and link quality. Our spatiotemporal feature alignment then projects vehicles' features into one spatial coordinate and corrects them to a synchronized fusion instant using AoIs, enabling all features to describe the scene coherently. Furthermore, due to varying synchronization and alignment quality, we estimate their uncertainties and integrate with AoI to generate feature weights for efficient fusion, prioritizing fresh, reliable feature regions. Simulations show consistent perception accuracy improvements over strong baselines under clock drifts and link delays.

Spatiotemporal Feature Alignment and Weighted Fusion in Collaborative Perception Enabled by Network Synchronization and Age of Information

TL;DR

The paper addresses the challenge of unreliable fusion in collaborative IoV perception caused by spatiotemporal misalignment from clock drift and varying delays. It proposes a framework combining network synchronization via two-way timestamp exchange and Kalman-based clock-state tracking with AoI to establish a common temporal reference, then performs spatiotemporal feature alignment and an uncertainty-aware, RoI-level weighted fusion. Key contributions include a robust synchronization scheme, a temporal compensator guided by AoI, and a reliability metric that jointly considers time-to-space uncertainty for fusion weighting, all trained end-to-end. Results in simulated clock drift and delay scenarios show consistent perception accuracy gains over strong baselines, with ablation studies confirming the value of synchronization and delay modeling. The approach enables fresh, reliable information to be fused efficiently, improving decision-making in IoV while adapting to dynamic network conditions.

Abstract

Collaborative perception in Internet of Vehicles (IoV) aggregates multi-vehicle observations for broader scene coverage and improved decision-making. However, fusion quality degrades under spatiotemporal heterogeneity from unsynchronized clocks, communication delays, and motion variations across vehicles. Prior work mitigates these through spatial transformations or fixed time-offset corrections, overlooking time-varying clock drifts and delays that cause persistent feature misalignment. To overcome these, we propose a spatiotemporal feature alignment and weighted fusion framework. Specifically, network synchronization is designed to continuously compensate for clock state differences between vehicles and establish a common time reference, onto which all feature timestamps can be mapped. After synchronization, to align the freshness of received features since their generation, their Age of Information (AoI) is determined by estimating network delay with given feature size and link quality. Our spatiotemporal feature alignment then projects vehicles' features into one spatial coordinate and corrects them to a synchronized fusion instant using AoIs, enabling all features to describe the scene coherently. Furthermore, due to varying synchronization and alignment quality, we estimate their uncertainties and integrate with AoI to generate feature weights for efficient fusion, prioritizing fresh, reliable feature regions. Simulations show consistent perception accuracy improvements over strong baselines under clock drifts and link delays.
Paper Structure (33 sections, 2 theorems, 43 equations, 7 figures)

This paper contains 33 sections, 2 theorems, 43 equations, 7 figures.

Key Result

Lemma 1

According to Definition def_1, the ego vehicle's local fusion timestamp $\hat{\tau}_f$ is mapped to the shared temporal reference, yielding the synchronized fusion instant:

Figures (7)

  • Figure 1: Collaborative perception impacted by spatiotemporal feature misalignment and inefficient feature fusion.
  • Figure 2: Workflow of our proposed framework for collaborative perception, including network synchronization, spatiotemporal feature alignment, and weighted feature fusion.
  • Figure 3: AoI and related metrics on a shared time reference: $s_l(t_f)$ (feature generation time), $u_{l\to m}(t_f)$ (latest arrived generation time), $t_f$ (fusion time), and $t_f+\hat{\Delta t}_{\mathrm{comm}}(l{\to}m,r,t_f)$ (predicted arrival time). Brackets: ① $\mathcal{A}_{l\to m}(t_f)$, ② $\mathcal{S}_l(t_f)$, ③ $\tilde{\mathcal{A}}_{l\to m}(r,t_f)$.
  • Figure 4: Performance comparison of various schemes, including our proposed method, CoBEVFlow, and SyncNet, evaluating their (a) mAP@0.5, and (b) mAP@0.7.
  • Figure 5: Impact of temporal misalignment, from 2 to 7 timestamps, evaluating their (a) mAP@0.5, and (b) mAP@0.7.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1: Synchronized Timestamp Mapping
  • Lemma 1: Fusion time
  • Definition 2: Source Age at Fusion
  • Definition 3: Arrival Age (AoI) at Fusion
  • Proposition 1
  • Definition 4: Delivery-Time AoI
  • Example 1