Table of Contents
Fetching ...

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaoqi Wang

TL;DR

SD-MVS introduces segmentation driven deformation for PatchMatch based MVS by leveraging SAM instance segmentation to guide patch deformation on both matching cost and propagation. It couples multi-scale cost aggregation with a novel spherical gradient refinement and a pixelwise depth interval search, and automatic hyperparameter tuning via EM optimization, yielding state-of-the-art performance on ETH3D with improved completeness and efficiency on Tanks and Temples. The method demonstrates strong robustness in textureless regions and offers practical benefits for large-scale 3D reconstruction by balancing memory usage and runtime. Overall, SD-MVS advances textureless area reconstruction through segmentation aware patch operations, geometry aware refinement, and data-driven parameter tuning.

Abstract

In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

TL;DR

SD-MVS introduces segmentation driven deformation for PatchMatch based MVS by leveraging SAM instance segmentation to guide patch deformation on both matching cost and propagation. It couples multi-scale cost aggregation with a novel spherical gradient refinement and a pixelwise depth interval search, and automatic hyperparameter tuning via EM optimization, yielding state-of-the-art performance on ETH3D with improved completeness and efficiency on Tanks and Temples. The method demonstrates strong robustness in textureless regions and offers practical benefits for large-scale 3D reconstruction by balancing memory usage and runtime. Overall, SD-MVS advances textureless area reconstruction through segmentation aware patch operations, geometry aware refinement, and data-driven parameter tuning.

Abstract

In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.
Paper Structure (47 sections, 22 equations, 10 figures, 3 tables)

This paper contains 47 sections, 22 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Comparative analysis of patch deformation strategies between APD-MVS and our approach. APD-MVS (a) selects green anchor pixels from pixels characterized by similar colors but may have inconsistent depths to help reconstruct central red pixel, leading to potential inaccuracy. Conversely, our method (b) utilizes neighboring pixels inside the segmentation boundary for reconstruction.
  • Figure 2: An illustrated pipeline of our proposed method. Images with multi views are initially downsampled and further allocated into our multi-scale architecture. Through leveraging the SAM-based segmentation, we carry out patch deformation on the matching cost to gain multi-scale matching costs $C_{ms}$. By integrating $C_{ms}$ with the projection color error $C_{pc}$ and the reprojection error $C_{rp}$, the aggregated cost is acquired. Then we again employ the SAM-based segmentation for patch deformation in propagation, succeeded by load-balancing within each search domain. Subsequently, we alternately iterates spherical gradient refinement on normals and pixelwise search interval on depths for enhanced accuracy. Finally, we employ EM-based optimization for the hyperparameter tuning of $w_{ms}$, $w_{rp}$, $w_{pc}$ and reassign them for the next iteration procedure.
  • Figure 3: Comparative analysis of patch deformation strategies between the SAM-based instance segmentation and the Canny edge detection on partial scenes of ETH3D dadaset (office and kicker). From top to bottom, (a), (b) and (c) respectively show the original images, the SAM-based segmentation results and the Canny edge detection results. Representative areas in red boxes illustrate the advantages of SAM-based segmentation over Canny edge detection.
  • Figure 4: Patch deformation on matching cost. (a) is the matching cost scheme from ACMMP, (b) shows the distance of each directions and (c) illustrates the deformed patch.
  • Figure 5: Patch deformation on propagation. (a) is the propagation pattern of ACMMP, (b) depicts the length of each propagation branch, and (c) illustrates different search domains with different colors.
  • ...and 5 more figures