Tokenizing Buildings: A Transformer for Layout Synthesis
Manuel Ladron de Guevara, Jinmo Rhee, Ardavan Bidgoli, Vaidas Razgaitis, Michael Bergin
TL;DR
SBM reframes BIM layout synthesis as sequence modeling by introducing BIM-Token Bundles and a mixed-type embedding module, enabling a single Transformer backbone to perform both room-embedding retrieval and autoregressive entity generation (DDEP). The two-token streams (envelope and entity) preserve topology, geometry, and semantic attributes with wall-referenced coordinates, yielding geometry-aware embeddings and coherent, constraint-respecting layouts. Empirical results show DDEP achieves near-complete inventory coverage, superior navigability, and the lowest overlap/clearance violations compared with text, VLM, and domain-specific baselines, while SBM embeddings exhibit strong geometric clustering by room type. The work demonstrates that domain-specific tokenization and a unified Transformer can outperform large general-purpose models on BIM layout tasks, with practical implications for scalable, editable BIM layout generation.
Abstract
We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts, with fewer collisions and boundary violations and improved navigability.
