Table of Contents
Fetching ...

Tokenizing Buildings: A Transformer for Layout Synthesis

Manuel Ladron de Guevara, Jinmo Rhee, Ardavan Bidgoli, Vaidas Razgaitis, Michael Bergin

TL;DR

SBM reframes BIM layout synthesis as sequence modeling by introducing BIM-Token Bundles and a mixed-type embedding module, enabling a single Transformer backbone to perform both room-embedding retrieval and autoregressive entity generation (DDEP). The two-token streams (envelope and entity) preserve topology, geometry, and semantic attributes with wall-referenced coordinates, yielding geometry-aware embeddings and coherent, constraint-respecting layouts. Empirical results show DDEP achieves near-complete inventory coverage, superior navigability, and the lowest overlap/clearance violations compared with text, VLM, and domain-specific baselines, while SBM embeddings exhibit strong geometric clustering by room type. The work demonstrates that domain-specific tokenization and a unified Transformer can outperform large general-purpose models on BIM layout tasks, with practical implications for scalable, editable BIM layout generation.

Abstract

We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts, with fewer collisions and boundary violations and improved navigability.

Tokenizing Buildings: A Transformer for Layout Synthesis

TL;DR

SBM reframes BIM layout synthesis as sequence modeling by introducing BIM-Token Bundles and a mixed-type embedding module, enabling a single Transformer backbone to perform both room-embedding retrieval and autoregressive entity generation (DDEP). The two-token streams (envelope and entity) preserve topology, geometry, and semantic attributes with wall-referenced coordinates, yielding geometry-aware embeddings and coherent, constraint-respecting layouts. Empirical results show DDEP achieves near-complete inventory coverage, superior navigability, and the lowest overlap/clearance violations compared with text, VLM, and domain-specific baselines, while SBM embeddings exhibit strong geometric clustering by room type. The work demonstrates that domain-specific tokenization and a unified Transformer can outperform large general-purpose models on BIM layout tasks, with practical implications for scalable, editable BIM layout generation.

Abstract

We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts, with fewer collisions and boundary violations and improved navigability.

Paper Structure

This paper contains 37 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Small Building Model (SBM) is an encoder-decoder Transformer that generates functionally correct and semantically coherent layouts given a room envelope. Each row shows a different room type. Our approach outperforms general-purpose LLMs/VLMs and domain-specific methods.
  • Figure 2: Model overview. (a) BIM data extraction and assembly into a discrete set of token bundles. (b) SBM encoder stack processes the tokenized feature-attribute matrix and outputs a room representation. (c) SBM decoder stack consumes the room representation as memory to the cross-attention layers and the room entities as inputs, trained on next token prediction. (d) Use cases: our SBM is used for three main tasks: DDEP, information retrieval, and user-guided DDEP with the help of an agentic layer.
  • Figure 3: Qualitative comparison of generated layouts across five room types, showing representative results from seven baseline methods and our DDEP model.
  • Figure 4: UMAP visualization of room embeddings colored by room type category. SBM embeddings (left, NMI: 0.640) exhibit well-separated clusters with distinct boundaries between room types, demonstrating superior geometric and spatial understanding. E5-Large-v2 embeddings (right, NMI: 0.371) show more intermingled clusters with blurred boundaries, indicating weaker room type separation. The 1.7× higher NMI score for LBM reflects its specialization in capturing geometric structure and spatial relationships inherent in building layouts, rather than semantic similarity alone.