SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking
Yu Lin, Zhiheng Li, Yubo Cui, Zheng Fang
TL;DR
SeqTrack3D introduces a Seq2Seq framework for robust 3D single object tracking by jointly modeling sequences of point clouds and bounding boxes. It employs a Transformer-based encoder–decoder with a decoupled local-global feature encoding strategy to capture both spatial geometry and inter-frame motion, guided by historical boxes. The method achieves state-of-the-art results on NuScenes and Waymo, demonstrating enhanced robustness in sparse point scenarios while maintaining efficient runtime. The work highlights the value of sequence-level supervision for continuous 3D tracking and provides code publicly for reproducibility.
Abstract
3D single object tracking (SOT) is an important and challenging task for the autonomous driving and mobile robotics. Most existing methods perform tracking between two consecutive frames while ignoring the motion patterns of the target over a series of frames, which would cause performance degradation in the scenes with sparse points. To break through this limitation, we introduce Sequence-to-Sequence tracking paradigm and a tracker named SeqTrack3D to capture target motion across continuous frames. Unlike previous methods that primarily adopted three strategies: matching two consecutive point clouds, predicting relative motion, or utilizing sequential point clouds to address feature degradation, our SeqTrack3D combines both historical point clouds and bounding box sequences. This novel method ensures robust tracking by leveraging location priors from historical boxes, even in scenes with sparse points. Extensive experiments conducted on large-scale datasets show that SeqTrack3D achieves new state-of-the-art performances, improving by 6.00% on NuScenes and 14.13% on Waymo dataset. The code will be made public at https://github.com/aron-lin/seqtrack3d.
