Table of Contents
Fetching ...

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo

Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Jinguo Luo, Tianlu Mao, Zhaoqi Wang

TL;DR

MSP-MVS addresses the persistent challenge of reconstructing textureless regions in multi-view stereo by introducing edge-confined patch deformation guided by a multi-granularity segmentation prior derived from Semantic-SAM. The method combines edge aggregation and CRF-based refinement to align depth edges with semantic boundaries, and then applies adaptive equidistribution and disassemble-clustering to achieve attention-consistent deformation. A disparity-sampling synergistic 3D optimization jointly randomizes sampling pixels and disparities to escape local minima and identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples demonstrate state-of-the-art performance and strong generalization, with efficient memory usage and practical runtime. Together, these components provide a robust, scalable approach for high-quality 3D reconstruction in challenging textureless scenes, illustrating a productive integration of semantic priors with geometric matching.

Abstract

Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo

TL;DR

MSP-MVS addresses the persistent challenge of reconstructing textureless regions in multi-view stereo by introducing edge-confined patch deformation guided by a multi-granularity segmentation prior derived from Semantic-SAM. The method combines edge aggregation and CRF-based refinement to align depth edges with semantic boundaries, and then applies adaptive equidistribution and disassemble-clustering to achieve attention-consistent deformation. A disparity-sampling synergistic 3D optimization jointly randomizes sampling pixels and disparities to escape local minima and identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples demonstrate state-of-the-art performance and strong generalization, with efficient memory usage and practical runtime. Together, these components provide a robust, scalable approach for high-quality 3D reconstruction in challenging textureless scenes, illustrating a productive integration of semantic priors with geometric matching.

Abstract

Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.
Paper Structure (38 sections, 12 equations, 11 figures, 4 tables)

This paper contains 38 sections, 12 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Comparative analysis between APD-MVS and our method. In (c) and (d), green, blue and red dots respectively denote the central pixel, conventional PM and deformable PM. Due to the lack of depth edge guidance in APD-MVS (c), its deformable PM occurs edge-skipping, thereby covering areas with depth-discontinuity. Differently, our method (d) leverages multi-granularity segmentation image (b) as prior to guide deformable PM within homogeneous areas.
  • Figure 2: Pipeline of MSP-MVS. We first adopt Semantic-SAM to obtain multi-granularity segmentation images. We then aggregate and further refine these images as multi-granularity segmentation prior to facilitate edge-confined patch deformation. Subsequently, we propose adaptive equidistribution for sector division and disassemble-clustering strategy for anchor clustering to promote attention-consistent patch deformation. Additionally, we introduce disparity-sampling synergistic 3D optimization to help deformed patches identify their global-minimal matching costs. After several iterations we obtain final depth images.
  • Figure 3: Multi-Granularity Segmentation Prior. From (a) to (d), purple, green and gray backgrounds respectively denote homogeneous areas, heterogeneous areas and depth edges, with sectors divided at fixed angle $45^{\circ}$ by black dash lines. In (c), purple, blue, and green dots respectively denote pixels whose optimal anchor subsets $S_{max}$ equals $S_1$, $S_2$, and $S_3$. Since most pixels are purple in (c) (i.e., $S_{max} = S_1$), the scene-level masks $M^a_1$ and $M^b_1$ in (b) are reliable, while others are misidentified masks.
  • Figure 4: Attention-Consistent Patch Deformation. Blue and red dots respectively denote the central pixel and anchors. purple, green and gray backgrounds respectively denote homogeneous areas, heterogeneous areas and depth edges.
  • Figure 5: Disparity-Sampling Synergistic 3D Optimization. (a) and (b) respectively represent the fixed sampling pattern and 2D cost optimization employed by APD-MVS. (c) and (d) respectively illustrate the sampling pixel randomization and 3D cost profile optimization of our proposed method.
  • ...and 6 more figures