Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

Hao Tang; Ling Shao; Nicu Sebe; Luc Van Gool

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool

TL;DR

This work presents GTGAN, a graph Transformer GAN for graph-constrained architectural layout generation that jointly learns local and global node relations via CNA and NNA within a graph Transformer Encoder. It introduces a node-classification-based discriminator and a graph-based cycle-consistency loss to preserve semantic structure and spatial adjacencies, complemented by a WGAN-GP objective. A novel graph masked modeling pre-training framework (node and edge masking with a high mask ratio) yields strong graph embeddings, while fine-tuning uses only the encoder for downstream tasks. Across house layout, roof, and building layout tasks, GTGAN++ achieves state-of-the-art realism, diversity, and compatibility, with substantial training-time efficiency gains and robust ablations validating each design choice. The approach significantly advances graph-aware generative modeling for complex architectural layouts with practical impact on automated layout synthesis and design workflows.

Abstract

We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and global interactions across connected and non-connected graph nodes. Specifically, the proposed connected node attention (CNA) and non-connected node attention (NNA) aim to capture the global relations across connected nodes and non-connected nodes in the input graph, respectively. The proposed graph modeling block (GMB) aims to exploit local vertex interactions based on a house layout topology. Moreover, we propose a new node classification-based discriminator to preserve the high-level semantic and discriminative node features for different house components. To maintain the relative spatial relationships between ground truth and predicted graphs, we also propose a novel graph-based cycle-consistency loss. Finally, we propose a novel self-guided pre-training method for graph representation learning. This approach involves simultaneous masking of nodes and edges at an elevated mask ratio (i.e., 40%) and their subsequent reconstruction using an asymmetric graph-centric autoencoder architecture. This method markedly improves the model's learning proficiency and expediency. Experiments on three challenging graph-constrained architectural layout generation tasks (i.e., house layout generation, house roof generation, and building layout generation) with three public datasets demonstrate the effectiveness of the proposed method in terms of objective quantitative scores and subjective visual realism. New state-of-the-art results are established by large margins on these three tasks.

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

TL;DR

Abstract

Paper Structure (13 sections, 16 equations, 13 figures, 6 tables)

This paper contains 13 sections, 16 equations, 13 figures, 6 tables.

The Proposed Graph Transformer GAN
Graph Transformer-Based Generator
Node Classification-Based Discriminator
Graph-Based Cycle-Consistency Loss
Implementation Details
The Proposed Graph Masked Modeling
Pre-Training Strategy
Fine-Tuning Strategy
Experiments
Results on House Layout Generation
Results on House Roof Generation
Results on Building Layout Generation
Ablation Study

Figures (13)

Figure 1: Overview of the proposed graph Transformer encoder, which consists of a multi-head node attention and a graph modeling block. It can capture both global and local correlations for graph-constrained house generation. This encoder consists of $L{=}8$ identical blocks. The proposed connected node attention aims to capture long-range relations across connected nodes. Note that the proposed non-connected node attention has the same structure as the connected node attention but takes non-connected nodes as input. It aims to capture long-range relations across non-connected nodes.
Figure 2: This figure reveals the intricately architected self-supervised tasks embedded within GTGAN. A substantial majority of nodes or edges are randomly obscured, following which, the GTGAN undergoes a pre-training stage. During this stage, we aim to recreate the original rooms utilizing both the latent representations and masked tokens.
Figure 3: Visualization results compared with HouseGAN nauata2020house and HouseGAN++ nauata2021house on "1-3" subset. The last three rows contain non-connected nodes.
Figure 4: Visualization results compared with HouseGAN nauata2020house and HouseGAN++ nauata2021house on "4-6" subset. The last three rows contain non-connected nodes.
Figure 5: Visualization results compared with HouseGAN nauata2020house and HouseGAN++ nauata2021house on "7-9" subset. The last three rows contain non-connected nodes.
...and 8 more figures

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

TL;DR

Abstract

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)