3D Single-object Tracking in Point Clouds with High Temporal Variation
Qiao Wu, Kun Sun, Pei An, Mathieu Salzmann, Yanning Zhang, Jiaqi Yang
TL;DR
This work tackles 3D single-object tracking under high temporal variation by introducing HVTrack, a transformer-based framework augmented with a Relative-Pose-Aware Memory (RPM), Base-Expansion Feature Cross-Attention (BEA), and Contextual Point Guided Self-Attention (CPA). A KITTI-HV dataset is built by varying frame intervals to simulate HV conditions, enabling rigorous evaluation beyond standard smooth-variation benchmarks. HVTrack demonstrates strong gains over state-of-the-art trackers, notably surpassing CXTrack on KITTI-HV and achieving leading performance on Waymo across HV settings, with ablations confirming the value of each module. The approach offers robust tracking in dynamic, cluttered environments and lays groundwork for HV-aware 3D SOT with potential for further optimization and broader backbone support.
Abstract
The high temporal variation of the point clouds is the key challenge of 3D single-object tracking (3D SOT). Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth, failing to cope with high temporal variation data. In this paper, we present a novel framework for 3D SOT in point clouds with high temporal variation, called HVTrack. HVTrack proposes three novel components to tackle the challenges in the high temporal variation scenario: 1) A Relative-Pose-Aware Memory module to handle temporal point cloud shape variations; 2) a Base-Expansion Feature Cross-Attention module to deal with similar object distractions in expanded search areas; 3) a Contextual Point Guided Self-Attention module for suppressing heavy background noise. We construct a dataset with high temporal variation (KITTI-HV) by setting different frame intervals for sampling in the KITTI dataset. On the KITTI-HV with 5 frame intervals, our HVTrack surpasses the state-of-the-art tracker CXTracker by 11.3%/15.7% in Success/Precision.
