Range-Agnostic Multi-View Depth Estimation With Keyframe Selection
Andrea Conti, Matteo Poggi, Valerio Cambareri, Stefano Mattoccia
TL;DR
RAMDepth tackles depth estimation from multiple posed views without relying on prior scene depth ranges. It introduces a range-agnostic, purely 2D framework that reverses the traditional pipeline by iteratively refining depth along epipolar lines using deformable, correlation-guided sampling, with $D^s$ updated by a GRU and final depth upsampled via convex upsampling. A key byproduct is per-view matchability scores, enabling ranking and potential pruning of source views to save computation. Across diverse datasets, RAMDepth achieves accurate depth without depth-range priors and generalizes to monocular video and stereo setups, while offering a practical mechanism to select informative views for efficient inference.
Abstract
Methods for 3D reconstruction from posed frames require prior knowledge about the scene metric range, usually to recover matching cues along the epipolar lines and narrow the search range. However, such prior might not be directly available or estimated inaccurately in real scenarios -- e.g., outdoor 3D reconstruction from video sequences -- therefore heavily hampering performance. In this paper, we focus on multi-view depth estimation without requiring prior knowledge about the metric range of the scene by proposing RAMDepth, an efficient and purely 2D framework that reverses the depth estimation and matching steps order. Moreover, we demonstrate the capability of our framework to provide rich insights about the quality of the views used for prediction. Additional material can be found on our project page https://andreaconti.github.io/projects/range_agnostic_multi_view_depth.
