Adaptive Learning for Multi-view Stereo Reconstruction
Qinglu Min, Jie Zhao, Zhihao Zhang, Chen Min
TL;DR
This work tackles depth estimation in multi-view stereo by reformulating the learning objective with an adaptive Wasserstein loss and introducing an offset module to produce continuous, sub-pixel depths. It combines a discrete depth probability with per-depth offsets so that the final depth is $d' = d_s + Offset(u,v,d_s)$, enabling robust learning even when predicted and ground-truth depth distributions do not share supports. Empirically, the approach achieves state-of-the-art results on DTU and Tanks and Temples, and demonstrates scalability to BlendedMVS, with ablations confirming the benefits of both the adaptive Wasserstein loss and the offset module. The method offers a principled, end-to-end framework for accurate, continuous depth estimation in diverse MVS scenarios, potentially improving downstream 3D reconstruction tasks in real-world settings.
Abstract
Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.
