StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection
Yunshuang Yuan, Monika Sester
TL;DR
The paper tackles cooperative object detection under asynchronous LiDAR ticking times across multiple intelligent agents. It introduces Time-Aligned Cooperative Object Detection (TA-COOD) and a fully sparse, query-based temporal-spatial fusion framework named StreamLTS that uses per-object timestamps and a memory-augmented transformer to fuse data before sharing. It also adapts prominent datasets to reflect sub-frame asynchrony, creating OPV2Vt and DairV2Xt, and demonstrates superior accuracy and efficiency against state-of-the-art methods, especially at higher IoU thresholds. The findings show that modeling precise point-wise observation timestamps improves temporal context and time-related location predictability, with practical implications for robust cooperative perception and real-time autonomy.
Abstract
Cooperative perception via communication among intelligent traffic agents has great potential to improve the safety of autonomous driving. However, limited communication bandwidth, localization errors and asynchronized capturing time of sensor data, all introduce difficulties to the data fusion of different agents. To some extend, previous works have attempted to reduce the shared data size, mitigate the spatial feature misalignment caused by localization errors and communication delay. However, none of them have considered the asynchronized sensor ticking times, which can lead to dynamic object misplacement of more than one meter during data fusion. In this work, we propose Time-Aligned COoperative Object Detection (TA-COOD), for which we adapt widely used dataset OPV2V and DairV2X with considering asynchronous LiDAR sensor ticking times and build an efficient fully sparse framework with modeling the temporal information of individual objects with query-based techniques. The experiment results confirmed the superior efficiency of our fully sparse framework compared to the state-of-the-art dense models. More importantly, they show that the point-wise observation timestamps of the dynamic objects are crucial for accurate modeling the object temporal context and the predictability of their time-related locations. The official code is available at \url{https://github.com/YuanYunshuang/CoSense3D}.
