Table of Contents
Fetching ...

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

TL;DR

This work addresses geometric inconsistency in MVS cost volumes by introducing GoMVS, which propagates and aggregates costs in a geometrically informed manner. The core innovation is the geometrically consistent propagation (GCP) module, which uses a local planar model and surface normals to map neighboring depths to the reference depth space before aggregation, integrated into a 3D U-Net framework. The authors systematically compare normal cue sources and demonstrate state-of-the-art performance across DTU, Tanks & Temples, and ETH3D, with notable improvements in completeness and robustness, including a top rank on the TNT Advanced benchmark. The approach offers practical benefits for high-quality 3D reconstructions in challenging scenes, and the study highlights monocular normals as a robust complement to multi-view cues.

Abstract

Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this paper, we propose GoMVS to aggregate geometrically consistent costs, yielding better utilization of adjacent geometries. More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals. We achieve this by the geometric consistent propagation (GCP) module. It computes the correspondence from the adjacent depth hypothesis space to the reference depth space using surface normals, then uses the correspondence to propagate adjacent costs to the reference geometry, followed by a convolution for aggregation. Our method achieves new state-of-the-art performance on DTU, Tanks & Temple, and ETH3D datasets. Notably, our method ranks 1st on the Tanks & Temple Advanced benchmark.

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

TL;DR

This work addresses geometric inconsistency in MVS cost volumes by introducing GoMVS, which propagates and aggregates costs in a geometrically informed manner. The core innovation is the geometrically consistent propagation (GCP) module, which uses a local planar model and surface normals to map neighboring depths to the reference depth space before aggregation, integrated into a 3D U-Net framework. The authors systematically compare normal cue sources and demonstrate state-of-the-art performance across DTU, Tanks & Temples, and ETH3D, with notable improvements in completeness and robustness, including a top rank on the TNT Advanced benchmark. The approach offers practical benefits for high-quality 3D reconstructions in challenging scenes, and the study highlights monocular normals as a robust complement to multi-view cues.

Abstract

Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this paper, we propose GoMVS to aggregate geometrically consistent costs, yielding better utilization of adjacent geometries. More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals. We achieve this by the geometric consistent propagation (GCP) module. It computes the correspondence from the adjacent depth hypothesis space to the reference depth space using surface normals, then uses the correspondence to propagate adjacent costs to the reference geometry, followed by a convolution for aggregation. Our method achieves new state-of-the-art performance on DTU, Tanks & Temple, and ETH3D datasets. Notably, our method ranks 1st on the Tanks & Temple Advanced benchmark.
Paper Structure (27 sections, 11 equations, 4 figures, 6 tables)

This paper contains 27 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Comparison of reconstruction errors on Tanks and Temple benchmark. We show precision and recall error maps for the "Horse" scan. Our method demonstrates notable improvements over existing methods in challenging areas.
  • Figure 2: Overview of our method. Given a reference image and a set of source images, we use FPN to extract multi-scale features for cost volume reconstruction. To conduct geometrically consistent aggregation within the local window, we collect adjacent geometric cues and send them to the proposed geometrically consistent propagation (GCP) module, which computes the correspondence from the adjacent depth hypothesis space to the reference depth space. The resulting costs are endowed with geometric consistency, which facilitates better utilization of adjacent geometry and can be aggregated by the convolution.
  • Figure 3: Comparison of reconstruction results. Our method reconstructs more complete results in challenging areas.
  • Figure 4: Qualitative results on Tanks and Temples. Our method achieves detailed and complete reconstructions across different scenes.