ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo
Yuxi Hu, Jun Zhang, Zhe Zhang, Rafael Weilharter, Yuchen Rao, Kuangyi Chen, Runze Yuan, Friedrich Fraundorfer
TL;DR
ICG-MVSNet tackles depth estimation in multi-view stereo by explicitly leveraging geometric information within a single view and across views. It introduces Intra-View Fusion (IVF) to encode coordinate dependencies in a lightweight manner and Cross-View Aggregation (CVA) to propagate contextual priors across stages and depth hypotheses, within a coarse-to-fine 4-stage framework. A compact 3D-CNN regularizer yields a probability volume over depth hypotheses, optimized by a pixel-wise cross-entropy loss across stages. Across DTU and Tanks & Temples, the method achieves competitive or superior accuracy and completeness while using lower memory and faster inference than many peers, highlighting practical efficiency gains for 3D reconstruction tasks.
Abstract
Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. In this paper, we propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation. Specifically, we develop an intra-view feature fusion module that leverages the feature coordinate correlations within a single image to enhance robust cost matching. Additionally, we introduce a lightweight cross-view aggregation module that efficiently utilizes the contextual information from volume correlations to guide regularization. Our method is evaluated on the DTU dataset and Tanks and Temples benchmark, consistently achieving competitive performance against state-of-the-art works, while requiring lower computational resources.
