Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation
Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool
TL;DR
This work presents GTGAN, a graph Transformer GAN for graph-constrained architectural layout generation that jointly learns local and global node relations via CNA and NNA within a graph Transformer Encoder. It introduces a node-classification-based discriminator and a graph-based cycle-consistency loss to preserve semantic structure and spatial adjacencies, complemented by a WGAN-GP objective. A novel graph masked modeling pre-training framework (node and edge masking with a high mask ratio) yields strong graph embeddings, while fine-tuning uses only the encoder for downstream tasks. Across house layout, roof, and building layout tasks, GTGAN++ achieves state-of-the-art realism, diversity, and compatibility, with substantial training-time efficiency gains and robust ablations validating each design choice. The approach significantly advances graph-aware generative modeling for complex architectural layouts with practical impact on automated layout synthesis and design workflows.
Abstract
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and global interactions across connected and non-connected graph nodes. Specifically, the proposed connected node attention (CNA) and non-connected node attention (NNA) aim to capture the global relations across connected nodes and non-connected nodes in the input graph, respectively. The proposed graph modeling block (GMB) aims to exploit local vertex interactions based on a house layout topology. Moreover, we propose a new node classification-based discriminator to preserve the high-level semantic and discriminative node features for different house components. To maintain the relative spatial relationships between ground truth and predicted graphs, we also propose a novel graph-based cycle-consistency loss. Finally, we propose a novel self-guided pre-training method for graph representation learning. This approach involves simultaneous masking of nodes and edges at an elevated mask ratio (i.e., 40%) and their subsequent reconstruction using an asymmetric graph-centric autoencoder architecture. This method markedly improves the model's learning proficiency and expediency. Experiments on three challenging graph-constrained architectural layout generation tasks (i.e., house layout generation, house roof generation, and building layout generation) with three public datasets demonstrate the effectiveness of the proposed method in terms of objective quantitative scores and subjective visual realism. New state-of-the-art results are established by large margins on these three tasks.
