COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

Liu He; Daniel Aliaga

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

Liu He, Daniel Aliaga

TL;DR

COHO introduces a context-sensitive, city-scale urban layout generator built on a canonical graph representation and a Graph-based Masked AutoEncoder (GMAE). By encoding blocks, buildings, and communities into a unified graph and applying self-supervised masking with priority-based iterative sampling, it achieves realistic, semantically consistent 2.5D layouts across 330 US cities. It outperforms baselines on context-awareness and realism while offering fast inference and auxiliary capabilities, such as socio-economic metric prediction and semantic manipulation. The work provides an open dataset and code to enable scalable urban layout synthesis for planning, digital twins, and content creation, with potential extensions to 3D city modeling and multi-view synthesis.

Abstract

The generation of large-scale urban layouts has garnered substantial interest across various disciplines. Prior methods have utilized procedural generation requiring manual rule coding or deep learning needing abundant data. However, prior approaches have not considered the context-sensitive nature of urban layout generation. Our approach addresses this gap by leveraging a canonical graph representation for the entire city, which facilitates scalability and captures the multi-layer semantics inherent in urban layouts. We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation. The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure, enabling self-supervised masked training for graph autoencoder. Additionally, we employ scheduled iterative sampling for 2.5D layout generation, prioritizing the generation of important city blocks and buildings. Our approach achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities. Codes and datasets are released at https://github.com/Arking1995/COHO.

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

TL;DR

Abstract

Paper Structure (41 sections, 4 equations, 9 figures, 4 tables)

This paper contains 41 sections, 4 equations, 9 figures, 4 tables.

Introduction
Related Work
Layout Generation.
Infinite Visual Synthesis.
Learning based on Quantized Representations.
Method
Canonical Graph Representation
Node (or City Block).
Building Layout Quantization.
Edge.
Graph-based Masked AutoEncoder
Priority-based Scheduled Generation
Experiments
Implementation Details
Open Dataset
...and 26 more sections

Figures (9)

Figure 1: Context-Sensitive Generation. Our method (Green) pursues realistic context harmonization among neighboring city blocks as real data (bottom-middle). Other methods (e.g. LayoutDM inoue2023layoutdm, GlobalMapper he2023globalmapper) show over-diversity/-similarity in city-scale layout generation (evaluated by Context Score $CTS$ as Eq. \ref{['eqn:CTS2']}.). Fully random and identical layouts are synthetically generated to illustrate extreme cases.
Figure 2: Canonical Graph Representation. Our method represents a city as a canonical graph $G$. Each node $b_{i}$ represent a single city block, and each edge $e_{ij}$ connects spatially adjacent blocks. Each block/node corresponds to a set of node features $s_{i}$ and a quantized vector $q_{i}$ hierarchically capturing enclosed building layouts. Edge feature $d_{ij}$ encodes distances between block centroids. Graph $G$ is used for GMAE training.
Figure 3: Graph-based Masked Autoencoder. Given a canonical city graph, quantized building layout features $Q$ are masked with dynamic masking ratios $m\in[0.5, 1.0]$, while block shape and location features $S$ are kept. The GNN encoder uses message passing between neighboring nodes to obtain the context-aware node features $F$. The decoder uses $F$ to reconstruct $Q'$. The predicted $Q'$ are decoded to 2.5D urban layouts.
Figure 4: Priority-based Scheduled Generation. We iteratively utilize pretrained GMAE to reconstruct masked node features. In each iteration, we accept a certain ratio of predicted nodes decided by the scheduling function $\beta(t) = 1-cos(t/T)$. We obtain a full graph after $T$ iterations.
Figure 5: Qualitative Comparisons. Given the same road network (except SDXL podell2023sdxl), all above methods generate urban layouts in only one pass without post-processing or human-in-the-loop refinement. The "even rows" are a zoom-in of the highlighted areas in the "odd rows". Our method generates a realistic distribution of urban layouts with plausible context-dependent behaviors (as indicated by $CTS$ score). VTN arroyo2021variational and LayoutDM inoue2023layoutdm show abnormal style shift among neighboring blocks, and no awareness of road networks. GlobalMapper he2023globalmapper generates over-similar communities. SDXL podell2023sdxl provides poor outpainting quality containing semantic errors and undesired overlaps.
...and 4 more figures

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

TL;DR

Abstract

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)