Table of Contents
Fetching ...

AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction

Soulaimene Turki, Daniel Panangian, Houda Chaabouni-Chouayakh, Ksenia Bittner

TL;DR

AIM2PC tackles the challenge of reconstructing complete 3D building point clouds from a single aerial image, addressing the limitations of rooftop-only reconstructions and the scarcity of pose-equipped datasets. It introduces an edge-enhanced, diffusion-based framework conditioned on concatenated image features, a binary building mask, and Sobel edge maps, implemented via a Centered Denoising Diffusion Probabilistic Model to fuse 2D cues into a fully 3D building representation. A new dataset providing complete 3D point clouds and corresponding camera poses enables training and fair benchmarking. Quantitative results show notable improvements in F-Score and Chamfer Distance over baselines PC² and CCD-3DR, with qualitative evidence of sharper edges and more complete geometry. This approach offers a scalable, cost-effective path for urban 3D reconstruction from single-view aerial imagery and establishes a resource for future comparisons.

Abstract

Three-dimensional urban reconstruction of buildings from single-view images has attracted significant attention over the past two decades. However, recent methods primarily focus on rooftops from aerial images, often overlooking essential geometrical details. Additionally, there is a notable lack of datasets containing complete 3D point clouds for entire buildings, along with challenges in obtaining reliable camera pose information for aerial images. This paper addresses these challenges by presenting a novel methodology, AIM2PC , which utilizes our generated dataset that includes complete 3D point clouds and determined camera poses. Our approach takes features from a single aerial image as input and concatenates them with essential additional conditions, such as binary masks and Sobel edge maps, to enable more edge-aware reconstruction. By incorporating a point cloud diffusion model based on Centered denoising Diffusion Probabilistic Models (CDPM), we project these concatenated features onto the partially denoised point cloud using our camera poses at each diffusion step. The proposed method is able to reconstruct the complete 3D building point cloud, including wall information and demonstrates superior performance compared to existing baseline techniques. To allow further comparisons with our methodology the dataset has been made available at https://github.com/Soulaimene/AIM2PCDataset

AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction

TL;DR

AIM2PC tackles the challenge of reconstructing complete 3D building point clouds from a single aerial image, addressing the limitations of rooftop-only reconstructions and the scarcity of pose-equipped datasets. It introduces an edge-enhanced, diffusion-based framework conditioned on concatenated image features, a binary building mask, and Sobel edge maps, implemented via a Centered Denoising Diffusion Probabilistic Model to fuse 2D cues into a fully 3D building representation. A new dataset providing complete 3D point clouds and corresponding camera poses enables training and fair benchmarking. Quantitative results show notable improvements in F-Score and Chamfer Distance over baselines PC² and CCD-3DR, with qualitative evidence of sharper edges and more complete geometry. This approach offers a scalable, cost-effective path for urban 3D reconstruction from single-view aerial imagery and establishes a resource for future comparisons.

Abstract

Three-dimensional urban reconstruction of buildings from single-view images has attracted significant attention over the past two decades. However, recent methods primarily focus on rooftops from aerial images, often overlooking essential geometrical details. Additionally, there is a notable lack of datasets containing complete 3D point clouds for entire buildings, along with challenges in obtaining reliable camera pose information for aerial images. This paper addresses these challenges by presenting a novel methodology, AIM2PC , which utilizes our generated dataset that includes complete 3D point clouds and determined camera poses. Our approach takes features from a single aerial image as input and concatenates them with essential additional conditions, such as binary masks and Sobel edge maps, to enable more edge-aware reconstruction. By incorporating a point cloud diffusion model based on Centered denoising Diffusion Probabilistic Models (CDPM), we project these concatenated features onto the partially denoised point cloud using our camera poses at each diffusion step. The proposed method is able to reconstruct the complete 3D building point cloud, including wall information and demonstrates superior performance compared to existing baseline techniques. To allow further comparisons with our methodology the dataset has been made available at https://github.com/Soulaimene/AIM2PCDataset

Paper Structure

This paper contains 9 sections, 7 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Example of the new views (\ref{['fig:egfig1']}) and (\ref{['fig:egfig2']}), generated by aim2pc using the single-view aerial image (\ref{['fig:eginput']}).
  • Figure 2: Architecture of aim2pc for reconstructing an entire building from a single aerial image using an edge enhancement approach to effectively capture edges and finer building details. Block (a) illustrates the exact features projected, using the generated camera pose, onto the partially denoised point cloud. These features are a concatenation of those extracted from the ViT of the input image, the binary edge mask, and Sobel edge maps derived from the Sobel operator. Block (b) outlines our reverse process, which begins with Gaussian noise and progressively refines the point cloud to reconstruct the building utilizing a Centered Denoising Probabilistic Model (CDPM), conditioned on the combined features from Block (a).
  • Figure 3: Sample from our dataset, where (\ref{['fig:aerialrgb']}) represents the RGB aerial image from our initial dataset, (\ref{['fig:binarymask']}) is the binary mask of the building, and (\ref{['fig:sobel']}) is the Sobel edge map, while (\ref{['fig:datapc']}) shows our sampled normalized point cloud.
  • Figure 4: Qualitative comparison from a top view and an alternative viewing angle, showcasing our model alongside the baseline models. (\ref{['fig:input1']}) represents the aerial input single-view image, (\ref{['fig:pc1']}) shows the reconstruction using PC² , (\ref{['fig:ccd3dr1']}) displays the reconstruction using CCD-3DR , (\ref{['fig:ourspred']}) illustrates the reconstruction using our method aim2pc , and (\ref{['fig:gt1']}) presents the ground truth. The comparison highlights the superior performance of our method in accurately reconstructing the roof's shape. This is evident when looking at the top-view perspective of all models: aim2pc generates a point cloud with color distribution that closely matches the ground truth. In contrast, PC² not only underestimates the roof height but also misses certain sections, while CCD-3DR shows incomplete roof reconstructions with missing key details, as highlighted in red circles. These edge and structural issues become even more apparent when verified from alternative viewing angles.
  • Figure 5: Qualitative results of two novel reconstructed views (\ref{['fig:novelview']}) and (\ref{['fig:novelview2']}) generated by aim2pc from single input aerial images (\ref{['fig:singlequalitativeresult']}). This demonstrates the capability of our solution to reconstruct the complete point cloud of the entire building.