LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
Weirong Chen, Le Chen, Rui Wang, Marc Pollefeys
TL;DR
LEAP addresses robust visual odometry in dynamic environments by moving beyond two-view tracking to long-term point tracking that leverages temporal context. It introduces LEAP, which fuses visual cues, inter-track information via anchors, and a temporal probabilistic model to estimate trajectory distributions and per-point uncertainty, expressed as $p(\mathbf{X}|\mathbf{I},\mathbf{x}_q)$ with a multivariate Cauchy formulation. The LEAP front-end feeds into LEAP-VO, which tracks points over a window, filters tracks by visibility, dynamism, and uncertainty, and optimizes poses with a sliding-window BA using LEAP mappings $\text{LEAP}_{i\rightarrow j}$. Experiments on Replica, MPI Sintel, and TartanAir demonstrate substantial improvements over state-of-the-art baselines, particularly in dynamic scenes and under occlusion, highlighting the practical impact of long-term, uncertainty-aware tracking for VO.
Abstract
Visual odometry estimates the motion of a moving camera based on visual input. Existing methods, mostly focusing on two-view point tracking, often ignore the rich temporal context in the image sequence, thereby overlooking the global motion patterns and providing no assessment of the full trajectory reliability. These shortcomings hinder performance in scenarios with occlusion, dynamic objects, and low-texture areas. To address these challenges, we present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Moreover, LEAP's temporal probabilistic formulation integrates distribution updates into a learnable iterative refinement module to reason about point-wise uncertainty. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes. Our mindful integration showcases a novel practice by employing long-term point tracking as the front-end. Extensive experiments demonstrate that the proposed pipeline significantly outperforms existing baselines across various visual odometry benchmarks.
