Table of Contents
Fetching ...

ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points

Qirui Huang, Runze Zhang, Kangjun Liu, Minglun Gong, Hao Zhang, Hui Huang

TL;DR

ArcPro tackles the challenge of extracting structured 3D abstractions from extremely sparse and noisy architectural point clouds by learning a mapping from points to an architectural program P in a domain-specific language, and then compiling P into a mesh Y via a geometry compiler. The approach uses a 3D CNN encoder and a Transformer decoder to autoregressively predict tokenized programs under a syntax-constrained FSM, with synthetic data generated by procedural DSL-based generation to enable large-scale supervision. It demonstrates superior performance over traditional architectural proxy reconstruction and learning-based abstractions on SfM and sparse-point datasets, while also enabling extensions to multi-view images and natural language retrieval. The combination of architectural priors, programmatic representations, and efficient inference offers a scalable, interpretable pathway for urban modeling and digital twin applications, with potential impact on AR/VR, planning, and language-grounded architecture search.

Abstract

We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form. Inference by our method is highly efficient and produces plausible and faithful 3D abstractions. Comprehensive experiments demonstrate that ArcPro outperforms both traditional architectural proxy reconstruction and learning-based abstraction methods. We further explore its potential to work with multi-view image and natural language inputs.

ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points

TL;DR

ArcPro tackles the challenge of extracting structured 3D abstractions from extremely sparse and noisy architectural point clouds by learning a mapping from points to an architectural program P in a domain-specific language, and then compiling P into a mesh Y via a geometry compiler. The approach uses a 3D CNN encoder and a Transformer decoder to autoregressively predict tokenized programs under a syntax-constrained FSM, with synthetic data generated by procedural DSL-based generation to enable large-scale supervision. It demonstrates superior performance over traditional architectural proxy reconstruction and learning-based abstractions on SfM and sparse-point datasets, while also enabling extensions to multi-view images and natural language retrieval. The combination of architectural priors, programmatic representations, and efficient inference offers a scalable, interpretable pathway for urban modeling and digital twin applications, with potential impact on AR/VR, planning, and language-grounded architecture search.

Abstract

We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form. Inference by our method is highly efficient and produces plausible and faithful 3D abstractions. Comprehensive experiments demonstrate that ArcPro outperforms both traditional architectural proxy reconstruction and learning-based abstraction methods. We further explore its potential to work with multi-view image and natural language inputs.

Paper Structure

This paper contains 46 sections, 4 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Structured 3D abstraction (bottom and top right for a zoom-in) in the form of architectural programs, obtained by our method ArcPro, which takes as input an extremely sparse point cloud (only $\approx$300 points) segment over each building. Despite such low-density, non-uniform, and noisy inputs, our method produces clean, low-face-count meshes that structurally conform to the real building objects. ArcPro can process 1,090 buildings over an area of 2.92 $km^2$ in approximately 37 seconds on a single 4090 GPU.
  • Figure 2: Our architectural programs, a DSL defined using Backus-Naur Form, and the ArcPro method for structured 3D abstraction from point clouds. Procedural generation synthesizes paired programs and 3D meshes, from which point clouds are sampled to create input-output pairs. The network, consisting of a 3D convolutional encoder and a transformer decoder, is trained to autoregressively predict a program in tokenized format, which is then compiled into a 3D mesh as a structured abstraction of the input during inference.
  • Figure 3: The geometry compilation process transforms the program into architectural meshes by constructing an architectural tree that encodes layer heights, contours, and spatial relationships.
  • Figure 4: Child nodes contour generation: single child (top) and multiple children (bottom). Left shows parent node contour; right shows possible child node contours.
  • Figure 5: Qualitative comparison of our method with state-of-the-art (SOTA) methods on two evaluation datasets: the SfM and sparse sampling point clouds. The SOTA methods include traditional proxy reconstruction (PolyFit nan2017polyfit, KSR bauchet2020kinetic, ProxyRecon ProxyRecon24) and learning-based 3D abstraction (CA yang2021unsupervised, BSP-Net chen2020bsp). Our method outperforms all these alternatives. ArcPro's program representation balances the geometric primitives of cubes (as CA) and planes (as BSP-Net), enabling more efficient 3D abstraction of building structures.
  • ...and 13 more figures