Spatiotemporal Feature Alignment and Weighted Fusion in Collaborative Perception Enabled by Network Synchronization and Age of Information
Qiaomei Han, Xianbin Wang, Minghui Liwang, Dusit Niyato
TL;DR
The paper addresses the challenge of unreliable fusion in collaborative IoV perception caused by spatiotemporal misalignment from clock drift and varying delays. It proposes a framework combining network synchronization via two-way timestamp exchange and Kalman-based clock-state tracking with AoI to establish a common temporal reference, then performs spatiotemporal feature alignment and an uncertainty-aware, RoI-level weighted fusion. Key contributions include a robust synchronization scheme, a temporal compensator guided by AoI, and a reliability metric that jointly considers time-to-space uncertainty for fusion weighting, all trained end-to-end. Results in simulated clock drift and delay scenarios show consistent perception accuracy gains over strong baselines, with ablation studies confirming the value of synchronization and delay modeling. The approach enables fresh, reliable information to be fused efficiently, improving decision-making in IoV while adapting to dynamic network conditions.
Abstract
Collaborative perception in Internet of Vehicles (IoV) aggregates multi-vehicle observations for broader scene coverage and improved decision-making. However, fusion quality degrades under spatiotemporal heterogeneity from unsynchronized clocks, communication delays, and motion variations across vehicles. Prior work mitigates these through spatial transformations or fixed time-offset corrections, overlooking time-varying clock drifts and delays that cause persistent feature misalignment. To overcome these, we propose a spatiotemporal feature alignment and weighted fusion framework. Specifically, network synchronization is designed to continuously compensate for clock state differences between vehicles and establish a common time reference, onto which all feature timestamps can be mapped. After synchronization, to align the freshness of received features since their generation, their Age of Information (AoI) is determined by estimating network delay with given feature size and link quality. Our spatiotemporal feature alignment then projects vehicles' features into one spatial coordinate and corrects them to a synchronized fusion instant using AoIs, enabling all features to describe the scene coherently. Furthermore, due to varying synchronization and alignment quality, we estimate their uncertainties and integrate with AoI to generate feature weights for efficient fusion, prioritizing fresh, reliable feature regions. Simulations show consistent perception accuracy improvements over strong baselines under clock drifts and link delays.
