Salient Sparse Visual Odometry With Pose-Only Supervision
Siyu Chen, Kangcheng Liu, Chen Wang, Shenghai Yuan, Jianfei Yang, Lihua Xie
TL;DR
The paper tackles robust visual odometry under challenging lighting and motion conditions while reducing labeling burden. It introduces a pose-only supervised hybrid VO that bootstraps optical-flow learning through self-supervised homography pre-training and employs a salient patch-based sparse flow estimator paired with a weighted bundle adjustment layer. Key contributions include the salient patches strategy, the homography-based pre-training, and the patch refinement module, with strong generalization demonstrated across TartanAir, EuRoC, TUM, and OIVIO, plus a real-world robustness test. The proposed approach achieves competitive accuracy and superior robustness in unseen scenarios, offering a practical solution for autonomous systems requiring reliable VO without dense optical-flow supervision.
Abstract
Visual Odometry (VO) is vital for the navigation of autonomous systems, providing accurate position and orientation estimates at reasonable costs. While traditional VO methods excel in some conditions, they struggle with challenges like variable lighting and motion blur. Deep learning-based VO, though more adaptable, can face generalization problems in new environments. Addressing these drawbacks, this paper presents a novel hybrid visual odometry (VO) framework that leverages pose-only supervision, offering a balanced solution between robustness and the need for extensive labeling. We propose two cost-effective and innovative designs: a self-supervised homographic pre-training for enhancing optical flow learning from pose-only labels and a random patch-based salient point detection strategy for more accurate optical flow patch extraction. These designs eliminate the need for dense optical flow labels for training and significantly improve the generalization capability of the system in diverse and challenging environments. Our pose-only supervised method achieves competitive performance on standard datasets and greater robustness and generalization ability in extreme and unseen scenarios, even compared to dense optical flow-supervised state-of-the-art methods.
