Table of Contents
Fetching ...

Adaptive Learning for Multi-view Stereo Reconstruction

Qinglu Min, Jie Zhao, Zhihao Zhang, Chen Min

TL;DR

This work tackles depth estimation in multi-view stereo by reformulating the learning objective with an adaptive Wasserstein loss and introducing an offset module to produce continuous, sub-pixel depths. It combines a discrete depth probability with per-depth offsets so that the final depth is $d' = d_s + Offset(u,v,d_s)$, enabling robust learning even when predicted and ground-truth depth distributions do not share supports. Empirically, the approach achieves state-of-the-art results on DTU and Tanks and Temples, and demonstrates scalability to BlendedMVS, with ablations confirming the benefits of both the adaptive Wasserstein loss and the offset module. The method offers a principled, end-to-end framework for accurate, continuous depth estimation in diverse MVS scenarios, potentially improving downstream 3D reconstruction tasks in real-world settings.

Abstract

Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.

Adaptive Learning for Multi-view Stereo Reconstruction

TL;DR

This work tackles depth estimation in multi-view stereo by reformulating the learning objective with an adaptive Wasserstein loss and introducing an offset module to produce continuous, sub-pixel depths. It combines a discrete depth probability with per-depth offsets so that the final depth is , enabling robust learning even when predicted and ground-truth depth distributions do not share supports. Empirically, the approach achieves state-of-the-art results on DTU and Tanks and Temples, and demonstrates scalability to BlendedMVS, with ablations confirming the benefits of both the adaptive Wasserstein loss and the offset module. The method offers a principled, end-to-end framework for accurate, continuous depth estimation in diverse MVS scenarios, potentially improving downstream 3D reconstruction tasks in real-world settings.

Abstract

Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.
Paper Structure (30 sections, 11 equations, 4 figures, 4 tables)

This paper contains 30 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustrations of different loss functions for learning based MVS. For the regression based loss, the predicted depth probability may be in multi-modal distribution and the expectation depth value might be wrong as it is far from any predicted peaks. For the classification based loss, the outputs are chosen from the fixed discretized depth values so that sub-pixel accuracy is hard to obtain, especially for wide depth range scenes. The proposed adaptive Wasserstein loss with the offset module can obtain more accurate continuous depth values.
  • Figure 2: An overview of our proposed method. It simultaneously predicts a probability for the fixed discrete depth value and an additional offset value for each discrete depth value. The continuous depth values can be obtained by adding them together. This simple but effective offset module helps to improve the discrete depth prediction to achieve sub-pixel depth accuracy. The model is trained end-to-end by the adaptive Wasserstein loss for the predicted depth and ground depth distributions may not have any common supports.
  • Figure 3: Reconstruction results of the validation set in the BlendedMVS dataset. Our method can reconstruct both small and large scale scenes, which demonstrates the reconstruction scalability of our method.
  • Figure 4: Comparision of different loss functions with and without the offset module on the inferred depth map of scan 9 in DTU dataset.