AllTracker: Efficient Dense Point Tracking at High Resolution
Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas J. Guibas
TL;DR
AllTracker reframes long-range point tracking as dense, multi-frame optical flow between a query frame and all other frames, enabling all-pixel trajectories at high resolution. The method combines a ConvNeXt-based encoder, multi-scale appearance correlation, and iterative refinement with pixel-aligned temporal attention over sliding windows to produce dense flow and visibility estimates. It achieves state-of-the-art performance on dense high-resolution point-tracking benchmarks, while remaining memory- and speed-efficient enough for near real-time inference, and it benefits from joint training on optical flow and point-tracking data. The work highlights the practical value of dense, long-range tracking and provides extensive ablations and strong empirical results, while noting limitations in short-range motion estimation and advocating future work on larger temporal contexts and physics-informed constraints.
Abstract
We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU). A benefit of our design is that we can train jointly on optical flow datasets and point tracking datasets, and we find that doing so is crucial for top performance. We provide an extensive ablation study on our architecture details and training recipe, making it clear which details matter most. Our code and model weights are available at https://alltracker.github.io
