Table of Contents
Fetching ...

BuildAnyPoint: 3D Building Structured Abstraction from Diverse Point Clouds

Tongyan Hua, Haoran Gong, Yuan Liu, Di Wang, Ying-Cong Chen, Wufan Zhao

TL;DR

A Loosely Cascaded Diffusion Transformer (Loca-DiT) is designed that initially recovers the underlying distribution from noisy or sparse points, followed by autoregressively encapsulating them into compact meshes in order to recover artist-created building abstraction in this highly underconstrained setting.

Abstract

We introduce BuildAnyPoint, a novel generative framework for structured 3D building reconstruction from point clouds with diverse distributions, such as those captured by airborne LiDAR and Structure-from-Motion. To recover artist-created building abstraction in this highly underconstrained setting, we capitalize on the role of explicit 3D generative priors in autoregressive mesh generation. Specifically, we design a Loosely Cascaded Diffusion Transformer (Loca-DiT) that initially recovers the underlying distribution from noisy or sparse points, followed by autoregressively encapsulating them into compact meshes. We first formulate distribution recovery as a conditional generation task by training latent diffusion models conditioned on input point clouds, and then tailor a decoder-only transformer for conditional autoregressive mesh generation based on the recovered point clouds. Our method delivers substantial qualitative and quantitative improvements over prior building abstraction methods. Furthermore, the effectiveness of our approach is evidenced by the strong performance of its recovered point clouds on building point cloud completion benchmarks, which exhibit improved surface accuracy and distribution uniformity.

BuildAnyPoint: 3D Building Structured Abstraction from Diverse Point Clouds

TL;DR

A Loosely Cascaded Diffusion Transformer (Loca-DiT) is designed that initially recovers the underlying distribution from noisy or sparse points, followed by autoregressively encapsulating them into compact meshes in order to recover artist-created building abstraction in this highly underconstrained setting.

Abstract

We introduce BuildAnyPoint, a novel generative framework for structured 3D building reconstruction from point clouds with diverse distributions, such as those captured by airborne LiDAR and Structure-from-Motion. To recover artist-created building abstraction in this highly underconstrained setting, we capitalize on the role of explicit 3D generative priors in autoregressive mesh generation. Specifically, we design a Loosely Cascaded Diffusion Transformer (Loca-DiT) that initially recovers the underlying distribution from noisy or sparse points, followed by autoregressively encapsulating them into compact meshes. We first formulate distribution recovery as a conditional generation task by training latent diffusion models conditioned on input point clouds, and then tailor a decoder-only transformer for conditional autoregressive mesh generation based on the recovered point clouds. Our method delivers substantial qualitative and quantitative improvements over prior building abstraction methods. Furthermore, the effectiveness of our approach is evidenced by the strong performance of its recovered point clouds on building point cloud completion benchmarks, which exhibit improved surface accuracy and distribution uniformity.
Paper Structure (23 sections, 8 equations, 9 figures, 4 tables)

This paper contains 23 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: BuildAnyPoint showcases remarkable generalization across various point cloud distributions commonly found in urban settings. Left: airborne LiDAR. Middle: Structure-from-Motion. Right: noisy sparse sampling.
  • Figure 2: (a) Point2Building liu2024point2building aggressively compresses point clouds into a low-resolution feature grid to compensate for missing regions in the raw inputs. (b) ArcPro huang2025arcpro is constrained by a fixed architectural grammar that can only generate vertically extruded geometric primitives, limiting flexibility in representing common structures such as slanted roofs. (c) BuildAnyPoint probabilistically models the diverse building distributions within a high-resolution latent grid.
  • Figure 3: Overview of BuildAnyPoint, implemented using our generative framework Loca-DiT, which retrieves building abstraction from the input in two sequential steps via latent space transformations: (a) The hierarchical latent diffusion model $\theta$ generates an intermediate representation $\mathcal{P}_{out}$ on condition of the input point cloud $\mathcal{P}_{in}$, where the finer level of the latent representation $\mathcal{G}_s$ is conditioned on the coarser one $\mathcal{G}_d$. (b) $\mathcal{P}{out}$ is then tokenized into $\mathcal{T}{P}$ to condition a decoder-only transformer $\phi$, which autoregressively generates the mesh token sequence $\mathcal{T}_M$. The final artist-created mesh $\mathcal{M}$ is obtained by applying the Mesh Detokenization step $\mathcal{MD}$ to $\mathcal{T}_M$.
  • Figure 4: Overview of the training process for each Loca-DiT module. Our generative framework can be summarized as maintaining a set of latent spaces, with each tailored generative model learning how to sample and form a consistent feature space.
  • Figure 5: 3D Building Structured Abstraction Comparison. Qualitative comparison on three common urban point cloud distributions against City3D huang2022city3d and Point2Building (abbreviated as P2B.) liu2024point2building. Our generative framework achieves more complete and faithful structural recovery than the alternatives, a result attributed to its robust intermediate dense points (abbreviated as Inter.) reconstructed from the 3D generative prior, which ensures consistency across heterogeneous input scenarios.
  • ...and 4 more figures