MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Jinguo Luo, Tianlu Mao, Zhaoqi Wang
TL;DR
MSP-MVS addresses the persistent challenge of reconstructing textureless regions in multi-view stereo by introducing edge-confined patch deformation guided by a multi-granularity segmentation prior derived from Semantic-SAM. The method combines edge aggregation and CRF-based refinement to align depth edges with semantic boundaries, and then applies adaptive equidistribution and disassemble-clustering to achieve attention-consistent deformation. A disparity-sampling synergistic 3D optimization jointly randomizes sampling pixels and disparities to escape local minima and identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples demonstrate state-of-the-art performance and strong generalization, with efficient memory usage and practical runtime. Together, these components provide a robust, scalable approach for high-quality 3D reconstruction in challenging textureless scenes, illustrating a productive integration of semantic priors with geometric matching.
Abstract
Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.
