Table of Contents
Fetching ...

Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction

Hongxin Peng, Yongjian Liao, Weijun Li, Chuanyu Fu, Guoxin Zhang, Ziquan Ding, Zijie Huang, Qiku Cao, Shuting Cai

TL;DR

This work proposes an algorithm that accurately segments weakly-textured regions and constructs their plane priors, combined with triangulation priors, which form a reliable prior candidate set and introduces a novel global information aggregation cost function.

Abstract

Multi-View Stereo plays a pivotal role in civil engineering by facilitating 3D modeling, precise engineering surveying, quantitative analysis, as well as monitoring and maintenance. It serves as a valuable tool, offering high-precision and real-time spatial information crucial for various engineering projects. However, Multi-View Stereo algorithms encounter challenges in reconstructing weakly-textured regions within large-scale building scenes. In these areas, the stereo matching of pixels often fails, leading to inaccurate depth estimations. Based on the Segment Anything Model and RANSAC algorithm, we propose an algorithm that accurately segments weakly-textured regions and constructs their plane priors. These plane priors, combined with triangulation priors, form a reliable prior candidate set. Additionally, we introduce a novel global information aggregation cost function. This function selects optimal plane prior information based on global information in the prior candidate set, constrained by geometric consistency during the depth estimation update process. Experimental results on both the ETH3D benchmark dataset, aerial dataset, building dataset and real scenarios substantiate the superior performance of our method in producing 3D building models compared to other state-of-the-art methods. In summary, our work aims to enhance the completeness and density of 3D building reconstruction, carrying implications for broader applications in urban planning and virtual reality.

Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction

TL;DR

This work proposes an algorithm that accurately segments weakly-textured regions and constructs their plane priors, combined with triangulation priors, which form a reliable prior candidate set and introduces a novel global information aggregation cost function.

Abstract

Multi-View Stereo plays a pivotal role in civil engineering by facilitating 3D modeling, precise engineering surveying, quantitative analysis, as well as monitoring and maintenance. It serves as a valuable tool, offering high-precision and real-time spatial information crucial for various engineering projects. However, Multi-View Stereo algorithms encounter challenges in reconstructing weakly-textured regions within large-scale building scenes. In these areas, the stereo matching of pixels often fails, leading to inaccurate depth estimations. Based on the Segment Anything Model and RANSAC algorithm, we propose an algorithm that accurately segments weakly-textured regions and constructs their plane priors. These plane priors, combined with triangulation priors, form a reliable prior candidate set. Additionally, we introduce a novel global information aggregation cost function. This function selects optimal plane prior information based on global information in the prior candidate set, constrained by geometric consistency during the depth estimation update process. Experimental results on both the ETH3D benchmark dataset, aerial dataset, building dataset and real scenarios substantiate the superior performance of our method in producing 3D building models compared to other state-of-the-art methods. In summary, our work aims to enhance the completeness and density of 3D building reconstruction, carrying implications for broader applications in urban planning and virtual reality.

Paper Structure

This paper contains 22 sections, 22 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Overall workflow of the proposed approach. In the first stage, we generate raw depth from ACMP algorithm and the plane prior candidates composed of Delaunay triangulation prior and SAM Planar Prior (SPP). Then, we embedding prior candidates set into the update of depth estimation employing global information aggregation cost function (GIA), along with Geometric consistency for epipolar constraints (CGEC). Finally, we obtain the final depth estimation after PatchMatch MVS with geometric consistency.
  • Figure 2: Overall workflow of ACMP. Firstly, PatchMatch MVS model is employed for the input images to generate raw depth. After sparsification, Delaunay triangulation is used to model planes and generate a triangulated prior plane. Subsequently, with the help of planar models, planar prior assistance is leveraged to optimize depth maps.
  • Figure 3: (a) ETH3D high-resolution images, (b) Segment mask obtained through SAM, (c) Triangulation segmentation.
  • Figure 4: Epipolar Constraints. The camera centers of the left and right images are $C_1$ and $C_2$, respectively. $P$ is a coordinate in the world coordinate system, which represents a point in the scene. $C_1$, $C_2$, and $P$ create an epipolar plane. The epipolar lines $l_1$ and $l_2$ are the lines of intersection between the epipolar plane and the two image planes $I_r$ and $I_s$.
  • Figure 5: The selection of domain pixels in the cost of global information aggregation. $p$ is the current pixel that need to be aggregated, the blue part represents the selected domain pixels of $p$, while the others white part indicates the pixels that have not been aggregated.
  • ...and 10 more figures