GenAI Models Capture Urban Science but Oversimplify Complexity
Yecheng Zhang, Rong Zhao, Zimu Huang, Xinyu Wang, Yue Ma, Ying Long
TL;DR
AI4US introduces a Generate-Evaluate-Calibrate workflow to test GenAI's utility as a virtual laboratory for urban science, assessing both symbolic (theory-driven) and perceptual (scene-based) data. The study finds that GenAI can reproduce core urban patterns, such as scaling and center-periphery decay, and can mimic qualitative vitality, yet the synthetic outputs tend to be homogenous and biased in parameterization, creating Mirage cities. A post-hoc calibration based on Optimal Transport significantly improves distributional fidelity and, together with targeted prompting, brings outputs closer to empirical data, though GenAI is not yet a true world model. The work outlines a practical hybrid research path where generative priors enable rapid theory exploration and hypothesis generation, while calibration and causal urban simulators provide the depth and validity needed for robust urban science.
Abstract
Generative artificial intelligence (GenAI) models are increasingly used for scientific data generation, yet their alignment with empirical knowledge in urban science remains unclear. Here, we introduce AI4US (Artificial Intelligence for Urban Science), a framework that systematically evaluates leading GenAI models by testing their fidelity in generating both symbolic and perceptual urban data. For the symbolic domain, we benchmark generated data against foundational urban theories concerning scale, space, and morphology. For the perceptual domain, we validate the models' visual judgments against human benchmarks and, critically, leverage their generative control to conduct in causal experiments on urban perception. Our findings show that while GenAI models reproduce core theoretical patterns, the generated data exhibit crucial limitations: poor diversity, systematic parametric deviations, and improvement from prompt engineering. To address this, we introduce a post-hoc calibration procedure using optimal transport, which produces synthetic symbolic datasets with demonstrably higher fidelity.
