Table of Contents
Fetching ...

Synthesizing 3D Abstractions by Inverting Procedural Buildings with Transformers

Maximilian Dax, Jordi Berbel, Jan Stria, Leonidas Guibas, Urs Bergmann

TL;DR

This work tackles inferring abstracted 3D building descriptions from point clouds by inverting a forward procedural-building model using a transformer. It casts the problem in a simulation-based inference framework and trains a transformer to map point clouds to a programmatic abstraction encoded as a protocol-buffer language, leveraging a synthetic dataset generated from a procedural city model. Key contributions include a structured, tokenizable abstraction language, a Protocol Buffer–to–token scheme ensuring syntactic validity, and an encoder–decoder architecture that achieves high structural accuracy and robust inpainting under incomplete data. The results demonstrate strong in-distribution reconstruction and resilience to data perturbations, with limitations mainly arising from the expressiveness of the forward procedural model, suggesting future gains from more flexible procedural priors and real-data domain adaptation. The approach enables efficient rendering, editable abstractions, and principled evaluation of procedural models for applications in 3D mapping, synthetic environments, and AI training data generation.

Abstract

We generate abstractions of buildings, reflecting the essential aspects of their geometry and structure, by learning to invert procedural models. We first build a dataset of abstract procedural building models paired with simulated point clouds and then learn the inverse mapping through a transformer. Given a point cloud, the trained transformer then infers the corresponding abstracted building in terms of a programmatic language description. This approach leverages expressive procedural models developed for gaming and animation, and thereby retains desirable properties such as efficient rendering of the inferred abstractions and strong priors for regularity and symmetry. Our approach achieves good reconstruction accuracy in terms of geometry and structure, as well as structurally consistent inpainting.

Synthesizing 3D Abstractions by Inverting Procedural Buildings with Transformers

TL;DR

This work tackles inferring abstracted 3D building descriptions from point clouds by inverting a forward procedural-building model using a transformer. It casts the problem in a simulation-based inference framework and trains a transformer to map point clouds to a programmatic abstraction encoded as a protocol-buffer language, leveraging a synthetic dataset generated from a procedural city model. Key contributions include a structured, tokenizable abstraction language, a Protocol Buffer–to–token scheme ensuring syntactic validity, and an encoder–decoder architecture that achieves high structural accuracy and robust inpainting under incomplete data. The results demonstrate strong in-distribution reconstruction and resilience to data perturbations, with limitations mainly arising from the expressiveness of the forward procedural model, suggesting future gains from more flexible procedural priors and real-data domain adaptation. The approach enables efficient rendering, editable abstractions, and principled evaluation of procedural models for applications in 3D mapping, synthetic environments, and AI training data generation.

Abstract

We generate abstractions of buildings, reflecting the essential aspects of their geometry and structure, by learning to invert procedural models. We first build a dataset of abstract procedural building models paired with simulated point clouds and then learn the inverse mapping through a transformer. Given a point cloud, the trained transformer then infers the corresponding abstracted building in terms of a programmatic language description. This approach leverages expressive procedural models developed for gaming and animation, and thereby retains desirable properties such as efficient rendering of the inferred abstractions and strong priors for regularity and symmetry. Our approach achieves good reconstruction accuracy in terms of geometry and structure, as well as structurally consistent inpainting.

Paper Structure

This paper contains 10 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) We generate a synthetic training dataset by first unconditionally sampling building abstractions with a procedural model and then composing corresponding point clouds for each abstraction. (b) The inference model encodes the input with a point cloud transformer operating on point cloud voxels (Sec. \ref{['subsec:transformer']}). It then employs a language transformer to predict the corresponding abstraction in terms of our custom programmatic language (Sec. \ref{['subsec:dataset']}). (c) The transformer output is parsed as a Protocol Buffer and can be rendered in Unreal Engine 5.
  • Figure 2: Reconstruction performance of the transformer model. (a) Performance on various structural variables (see appendix Tab. \ref{['tab:app-structural-evaluation']} for definitions). (b) Reconstruction error as a function of the point cloud noise level, measured in terms of the mean deviation between noisy input point clouds and building geometries. The reconstruction error of the inferred buildings (blue) is only slightly larger than the reconstruction error of the ground truth buildings (black). Naturally, both grow with increasing point cloud noise level.
  • Figure 3: Inference results with modified point clouds. We drop random blocks (second row), a single large block in the center (third row), or split the point cloud in the center and move both halves away from each other (fourth row). The inference model successfully reconstructs the associated building. With the split modification, the missing information is filled with additional asset instances, resulting in additional columns of windows. We occasionally observe artefacts (red), like gaps between assets or slight offsets in window positions.
  • Figure 4: A procedural building is generated by placing a set of assets according to a set of handcrafted rules.
  • Figure 5: Custom format for the representation of abstract buildings. This hierarchically combines asset instances ("Cells") into recurring patterns ("CellsPattern"). These patterns are combined into facade instances, which can in turn be linked by the building storeys. Finally, a building is composed by combining such storeys along with variables characterizing the high-level geometry (height, footprint polygons) and the point cloud noise level. Definitions of CellModifier, Footprint and MaterialVariation are provided in Fig. \ref{['fig:building_proto_full']}.
  • ...and 2 more figures