Table of Contents
Fetching ...

Accelerating Urban Science Research with AI Urban Scientist

Tong Xia, Jiankun Zhang, Ruiwen You, Ao Xu, Linghao Zhang, Tengyao Tu, Jingzhi Wang, Jinghua Piao, Yunke Zhang, Fengli Xu, Yong Li

TL;DR

Urban science is hampered by manual, fragmented workflows amid abundant heterogeneous data. The AI Urban Scientist proposes a domain-informed multi-agent platform with a hypothesis base, expert-review signals, datasets, code, and simulators to enable end-to-end autonomous inquiry. Four specialized agents—Ideation, Critic, Data, and Analyzing—together generate hypotheses, discover datasets, run analyses, and synthesize outputs within an open, community-driven platform. The work highlights the need for domain-specific evaluation benchmarks, clear human roles, and shared standards to realize scalable, reproducible urban science discovery.

Abstract

Cities are complex, adaptive systems whose underlying principles remain difficult to disentangle despite unprecedented data abundance. Urban science therefore faces a fundamental challenge: converting vast, fragmented and interdisciplinary information into coherent explanations of how cities function and evolve. The emergence of AI scientists, i.e., agents capable of autonomous reasoning, hypothesis formation and data-driven experimentation, offers a new pathway toward accelerating this transformation, yet general-purpose systems fall short of the domain knowledge and methodological depth required for urban science research. Here we introduce a knowledge-driven AI Urban Scientist, built from hypotheses, peer-review signals, datasets and analytical patterns distilled from thousands of high-quality studies, and implemented as a coordinated multi-agent framework for end-to-end inquiry. The system generates structured hypotheses, retrieves and harmonizes heterogeneous datasets, conducts automated empirical analysis and simulation, and synthesizes insights in forms compatible with urban scientific reasoning. By providing reusable analytical tools and supporting community-driven extensions, the AI Urban Scientist lowers barriers to advanced urban analytics and acts not merely as an assistant but as an active collaborator in revealing the mechanisms that shape urban systems and in guiding the design of more resilient and equitable cities.

Accelerating Urban Science Research with AI Urban Scientist

TL;DR

Urban science is hampered by manual, fragmented workflows amid abundant heterogeneous data. The AI Urban Scientist proposes a domain-informed multi-agent platform with a hypothesis base, expert-review signals, datasets, code, and simulators to enable end-to-end autonomous inquiry. Four specialized agents—Ideation, Critic, Data, and Analyzing—together generate hypotheses, discover datasets, run analyses, and synthesize outputs within an open, community-driven platform. The work highlights the need for domain-specific evaluation benchmarks, clear human roles, and shared standards to realize scalable, reproducible urban science discovery.

Abstract

Cities are complex, adaptive systems whose underlying principles remain difficult to disentangle despite unprecedented data abundance. Urban science therefore faces a fundamental challenge: converting vast, fragmented and interdisciplinary information into coherent explanations of how cities function and evolve. The emergence of AI scientists, i.e., agents capable of autonomous reasoning, hypothesis formation and data-driven experimentation, offers a new pathway toward accelerating this transformation, yet general-purpose systems fall short of the domain knowledge and methodological depth required for urban science research. Here we introduce a knowledge-driven AI Urban Scientist, built from hypotheses, peer-review signals, datasets and analytical patterns distilled from thousands of high-quality studies, and implemented as a coordinated multi-agent framework for end-to-end inquiry. The system generates structured hypotheses, retrieves and harmonizes heterogeneous datasets, conducts automated empirical analysis and simulation, and synthesizes insights in forms compatible with urban scientific reasoning. By providing reusable analytical tools and supporting community-driven extensions, the AI Urban Scientist lowers barriers to advanced urban analytics and acts not merely as an assistant but as an active collaborator in revealing the mechanisms that shape urban systems and in guiding the design of more resilient and equitable cities.

Paper Structure

This paper contains 13 sections, 8 figures.

Figures (8)

  • Figure 1: The AI urban scientist workflow. An overview of how an AI Scientist system can assist urban research by automating each stage of the scientific pipeline—from identifying key topics, generating research hypotheses, discovering relevant datasets, and producing experimental analyses, to drafting the final research paper. This workflow illustrates the end-to-end integration of ideation, interdisciplinary knowledge synthesis, data retrieval, auto-coding, and result interpretation enabled by AI-driven research agents.
  • Figure 2: Our AI urban scientist system. The system integrates five core knowledge bases: (1) a hypothesis base distilled from 15k academic papers, (2) a review base containing 2k+ expert comments, (3) a data base of 20k+ urban datasets, (4) a code base of 10k+ analytical scripts, and (5) a simulator reflecting standard urban modeling practices. These components support four collaborating agents: an Ideation Agent that generates and mutates hypotheses, a Critic Agent fine-tuned on review knowledge, a Data Agent responsible for data discovery, and an Analyzing Agent that executes analytical workflows. Together, the system emulates the workflow of a domain-informed urban scientist, enabling reliable hypothesis generation, dataset matching, analysis execution, and scientific reasoning.
  • Figure 3: The hypothesis-generation workflow of the Ideation Agent. The system constructs a hypothesis base from 15K research papers and decomposes mature hypotheses into CAMP components—Context, Variables, Mechanism, and Pattern. The Ideation Agent applies four scientific transformations (context exchange, variable exchange, mechanism exchange, and pattern modification) to generate initial hypotheses. These ideas undergo iterative refinement through two mechanisms: a panel of multi-disciplinary virtual scientists and a domain-trained Critic Agent. The resulting refined hypotheses form high-quality candidates for downstream empirical analysis.
  • Figure 4: The construction and training pipeline of the Critic Agent. We align the Critic Agent with domain standards by integrating the editorial principles of Nature and Nature Cities as the system prompt. A corpus of 15K papers provides 15K+ seed hypotheses, while 2K+ expert reviewer comments from Nature family journals supply high-quality positive and negative evaluation signals for fine-tuning. Additionally, hypotheses extracted from the 15K papers are labeled into four tiers according to journal impact factor, enabling multi-level calibration. Together, these components produce a domain-informed Critic Agent capable of assigning idea-quality tiers (Tier 1–4 or reject) consistent with expert judgment in urban science.
  • Figure 5: The dataset construction and retrieval workflow of the Data Agent. The Data Agent extracts dataset information directly from the Data Availability sections of urban science papers using LLM-based semantic parsing, producing standardized data cards that include dataset name, region, time period, description, and URL. These cards populate a unified data pool spanning four major urban data categories: statistical infrastructure, human behavior, policy and survey, and multimodal sensing. Through semantic similarity matching, the Data Agent links hypotheses to relevant datasets and enables automated downloading, preprocessing, and integration for downstream analysis.
  • ...and 3 more figures