Table of Contents
Fetching ...

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo

Zhenlong Yuan, Jinguo Luo, Fei Shen, Zhaoxin Li, Cong Liu, Tianlu Mao, Zhaoqi Wang

TL;DR

This work addresses robust multi-view stereo in textureless regions where patch deformation can falter due to edge-skipping and occlusions. It introduces DVP-MVS, which synergizes depth-edge aligned priors derived from Depth Anything V2 and Roberts edges with cross-view visibility priors to enable visibility-aware patch deformation. Key contributions include an erosion-dilation based depth-edge alignment to generate fine-grained homogeneous boundaries, a visibility map restoration via cross-view reprojection, and geometry-driven propagation and refinement using aggregated visible hemispherical normals and adaptive depth intervals along epipolar lines. On ETH3D and Tanks & Temples, DVP-MVS achieves state-of-the-art performance with strong robustness and generalization across textureless and cluttered scenes.

Abstract

Patch deformation-based methods have recently exhibited substantial effectiveness in multi-view stereo, due to the incorporation of deformable and expandable perception to reconstruct textureless areas. However, such approaches typically focus on exploring correlative reliable pixels to alleviate match ambiguity during patch deformation, but ignore the deformation instability caused by mistaken edge-skipping and visibility occlusion, leading to potential estimation deviation. To remedy the above issues, we propose DVP-MVS, which innovatively synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation. Specifically, to avoid unexpected edge-skipping, we first utilize Depth Anything V2 followed by the Roberts operator to initialize coarse depth and edge maps respectively, both of which are further aligned through an erosion-dilation strategy to generate fine-grained homogeneous boundaries for guiding patch deformation. In addition, we reform view selection weights as visibility maps and restore visible areas by cross-view depth reprojection, then regard them as cross-view prior to facilitate visibility-aware patch deformation. Finally, we improve propagation and refinement with multi-view geometry consistency by introducing aggregated visible hemispherical normals based on view selection and local projection depth differences based on epipolar lines, respectively. Extensive evaluations on ETH3D and Tanks & Temples benchmarks demonstrate that our method can achieve state-of-the-art performance with excellent robustness and generalization.

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo

TL;DR

This work addresses robust multi-view stereo in textureless regions where patch deformation can falter due to edge-skipping and occlusions. It introduces DVP-MVS, which synergizes depth-edge aligned priors derived from Depth Anything V2 and Roberts edges with cross-view visibility priors to enable visibility-aware patch deformation. Key contributions include an erosion-dilation based depth-edge alignment to generate fine-grained homogeneous boundaries, a visibility map restoration via cross-view reprojection, and geometry-driven propagation and refinement using aggregated visible hemispherical normals and adaptive depth intervals along epipolar lines. On ETH3D and Tanks & Temples, DVP-MVS achieves state-of-the-art performance with strong robustness and generalization across textureless and cluttered scenes.

Abstract

Patch deformation-based methods have recently exhibited substantial effectiveness in multi-view stereo, due to the incorporation of deformable and expandable perception to reconstruct textureless areas. However, such approaches typically focus on exploring correlative reliable pixels to alleviate match ambiguity during patch deformation, but ignore the deformation instability caused by mistaken edge-skipping and visibility occlusion, leading to potential estimation deviation. To remedy the above issues, we propose DVP-MVS, which innovatively synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation. Specifically, to avoid unexpected edge-skipping, we first utilize Depth Anything V2 followed by the Roberts operator to initialize coarse depth and edge maps respectively, both of which are further aligned through an erosion-dilation strategy to generate fine-grained homogeneous boundaries for guiding patch deformation. In addition, we reform view selection weights as visibility maps and restore visible areas by cross-view depth reprojection, then regard them as cross-view prior to facilitate visibility-aware patch deformation. Finally, we improve propagation and refinement with multi-view geometry consistency by introducing aggregated visible hemispherical normals based on view selection and local projection depth differences based on epipolar lines, respectively. Extensive evaluations on ETH3D and Tanks & Temples benchmarks demonstrate that our method can achieve state-of-the-art performance with excellent robustness and generalization.

Paper Structure

This paper contains 41 sections, 11 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Comparison between patch deformation-based methods and ours. Edge-skipping causes other methods (a) incorrectly select reliable but depth-discontinuous gray pixels for patch deformation of the central unreliable red pixel. While our DVP-MVS (b) leverages depth-edge aligned prior to guarantee deformed patches within homogeneous areas.
  • Figure 2: Pipeline of DVP-MVS. We first adopt Depth Anything V2 followed by the Roberts operator to initialize corresponding depth and edge maps, respectively. We then employ an erosion-dilation strategy to extract the depth-edge aligned prior for robust patch deformation. Subsequently, we construct visibility maps by reforming view selection and adopting the reprojection-based post-verification for visibility map restoration, which are then treated as the cross-view prior to facilitate visibility-aware patch deformation. Finally, by considering geometric consistency, we respectively improve the propagation and refinement stages by introducing visible normals aggregation and epipolar line projection. After several iterations we obtain depth images.
  • Figure 3: Depth-Edge Aligned Prior. Edges are highlighted in white in (c), with black constituting dispersed regions. Different colors denote different dispersed regions in (d) and homogeneous areas in (e), with black indicating their boundaries. In (f) and (g), green, blue and red respectively denote the central pixel, neighbors in conventional PM and anchors in deformable PM. Cyan, green and gray backgrounds respectively denote heterogeneous areas, homogeneous areas and homogeneous boundaries.
  • Figure 4: Cross-View Prior. The red line in (a) separates visible and invisible areas of (a) within (b). In (c) and (d), black indicates pixels judged invisible by view selection strategy. In (e) and (f), green, blue and red indicate the central pixel, conventional PM neighbors and deformable PM anchors.
  • Figure 5: Geometry-Driven Propagation and Refinement. In (a) and (b), the blue view cone corresponds to the reference image $I_i$, while green, red, and yellow view cones correspond to visible source images $I^v_0$, $I^v_1$ and $I^v_2$.
  • ...and 6 more figures