Revisit Self-supervised Depth Estimation with Local Structure-from-Motion
Shengjie Zhu, Xiaoming Liu
TL;DR
The paper tackles the gap between self-supervised depth estimation and Structure-from-Motion by introducing a local SfM pipeline that operates over a small window (as few as $5$ frames). It replaces learning-through-loss with a Bundle-RANSAC-Adjustment pose optimization and a frustum Radiance Field triangulation with geometric verification to produce a sparse, geometrically verified root-depth while preserving metric scale. The approach yields poses, depth adjustments, and sparse triangulated depths, enabling self-supervised improvements to SoTA supervised models, and achieves state-of-the-art sparse-view pose accuracy and robust self-supervised correspondence estimation on RGB-D data. The work demonstrates practical benefits for temporally consistent depth, AR compositing, and NeRF-style rendering using a non-neural triangulation step, while also providing theoretical extensions via Hough Transform acceleration and accompanying proofs.
Abstract
Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrated RGB or RGB-D images, we employ a depth and correspondence estimator to infer depthmaps and pair-wise correspondence maps. Then, a novel bundle-RANSAC-adjustment algorithm jointly optimizes camera poses and one depth adjustment for each depthmap. Finally, we fix camera poses and employ a NeRF, however, without a neural network, for dense triangulation and geometric verification. Poses, depth adjustments, and triangulated sparse depths are our outputs. For the first time, we show self-supervision within $5$ frames already benefits SoTA supervised depth and correspondence models. The project page is held in the link (https://shngjz.github.io/SSfM.github.io/).
