GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang
TL;DR
This work addresses geometric inconsistency in MVS cost volumes by introducing GoMVS, which propagates and aggregates costs in a geometrically informed manner. The core innovation is the geometrically consistent propagation (GCP) module, which uses a local planar model and surface normals to map neighboring depths to the reference depth space before aggregation, integrated into a 3D U-Net framework. The authors systematically compare normal cue sources and demonstrate state-of-the-art performance across DTU, Tanks & Temples, and ETH3D, with notable improvements in completeness and robustness, including a top rank on the TNT Advanced benchmark. The approach offers practical benefits for high-quality 3D reconstructions in challenging scenes, and the study highlights monocular normals as a robust complement to multi-view cues.
Abstract
Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this paper, we propose GoMVS to aggregate geometrically consistent costs, yielding better utilization of adjacent geometries. More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals. We achieve this by the geometric consistent propagation (GCP) module. It computes the correspondence from the adjacent depth hypothesis space to the reference depth space using surface normals, then uses the correspondence to propagate adjacent costs to the reference geometry, followed by a convolution for aggregation. Our method achieves new state-of-the-art performance on DTU, Tanks & Temple, and ETH3D datasets. Notably, our method ranks 1st on the Tanks & Temple Advanced benchmark.
