DELTAv2: Accelerating Dense 3D Tracking
Tuan Duc Ngo, Ashkan Mirzaei, Guocheng Qian, Hanwen Liang, Chuang Gan, Evangelos Kalogerakis, Peter Wonka, Chaoyang Wang
TL;DR
DELTAv2 tackles the challenge of dense long-range 3D tracking by addressing two main bottlenecks in prior methods: the heavy transformer computations across many trajectories and the cost of 4D correlation features. It introduces a coarse-to-fine tracking scheme that subsamples trajectories and progressively densifies them, paired with a learnable interpolation module to propagate motion to untracked pixels. Additionally, it optimizes the 4D correlation computation with a lightweight projection to improve GPU utilization. The combined approach yields roughly 5x speedups over DELTA while maintaining state-of-the-art accuracy, enabling more practical real-time or large-scale dense 3D tracking on RGB-D videos.
Abstract
We propose a novel algorithm for accelerating dense long-term 3D point tracking in videos. Through analysis of existing state-of-the-art methods, we identify two major computational bottlenecks. First, transformer-based iterative tracking becomes expensive when handling a large number of trajectories. To address this, we introduce a coarse-to-fine strategy that begins tracking with a small subset of points and progressively expands the set of tracked trajectories. The newly added trajectories are initialized using a learnable interpolation module, which is trained end-to-end alongside the tracking network. Second, we propose an optimization that significantly reduces the cost of correlation feature computation, another key bottleneck in prior methods. Together, these improvements lead to a 5-100x speedup over existing approaches while maintaining state-of-the-art tracking accuracy.
