UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

Fengze Sun; Egemen Tanin; Shanika Karunasekera; Zuqing Li; Flora D. Salim; Jianzhong Qi

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

Fengze Sun, Egemen Tanin, Shanika Karunasekera, Zuqing Li, Flora D. Salim, Jianzhong Qi

TL;DR

UrbanVerse tackles the challenge of generalizing urban region representations across cities and tasks by introducing two integrated modules: CELearning, which learns transferable grid-cell embeddings through random-walk sequences and transformer-based encoding to produce region embeddings, and HCondDiffCT, a heterogeneous conditional diffusion framework that jointly models multiple downstream tasks with region priors and task semantics. The approach shifts from city-centric graphs to region-centric representations, enabling cross-city transfer, while the diffusion-based cross-task learner captures task-dependent distributions and uncertainties. Empirical results on three real-world cities show UrbanVerse achieving up to 35.89% improvements in $R^2$ across six tasks in cross-city settings, with additional strong gains when integrating HCondDiffCT into existing models and in cross-country/suburban scenarios. The work demonstrates practical impact by enabling a foundation-style urban analytics model capable of transferring knowledge across cities and tasks, and by providing uncertainty-aware predictions useful for decision making.

Abstract

Recent advances in urban region representation learning have enabled a wide range of applications in urban analytics, yet existing methods remain limited in their capabilities to generalize across cities and analytic tasks. We aim to generalize urban representation learning beyond city- and task-specific settings, towards a foundation-style model for urban analytics. To this end, we propose UrbanVerse, a model for cross-city urban representation learning and cross-task urban analytics. For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city. We model regions as nodes on a graph, which enables a random walk-based procedure to form "sequences of regions" that reflect both local and neighborhood structural features for urban region representation learning. For cross-task generalization, we propose a cross-task learning module named HCondDiffCT. This module integrates region-conditioned prior knowledge and task-conditioned semantics into the diffusion process to jointly model multiple downstream urban prediction tasks. HCondDiffCT is generic. It can also be integrated with existing urban representation learning models to enhance their downstream task effectiveness. Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings, achieving up to 35.89% improvements in prediction accuracy.

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

TL;DR

across six tasks in cross-city settings, with additional strong gains when integrating HCondDiffCT into existing models and in cross-country/suburban scenarios. The work demonstrates practical impact by enabling a foundation-style urban analytics model capable of transferring knowledge across cities and tasks, and by providing uncertainty-aware predictions useful for decision making.

Abstract

Paper Structure (45 sections, 32 equations, 8 figures, 17 tables, 2 algorithms)

This paper contains 45 sections, 32 equations, 8 figures, 17 tables, 2 algorithms.

Introduction
Related Work
Proposed Solution
Preliminaries
Proposed Model
Cross-city Embedding Learning
Cell Embedding Learning
Adaptive Region Embedding learning.
Heterogeneous Conditional Diffusion-based Cross-task Learning
Region-conditioned Prior Guidance
Task-conditioned Denoiser
Model Training and Inference.
Experiments
Experimental Settings
Overall Results (Q1)
...and 30 more sections

Figures (8)

Figure 1: Region representation learning frameworks.
Figure 2: UrbanVerse model overview. The model supports cross-city urban representation learning and cross-task urban analytics through two components: (1) CELearning takes a set $C$ of grid cells and first learns cell embeddings $\mathbf{E}$ via cell sequences formed by random walks over a graph of cells. It then aggregates cell embeddings to generate region embeddings $\mathbf{H}$ across multiple cities, facilitating cross-city generalization. (2) HCondDiffCT jointly models multiple tasks within a diffusion process to achieve cross-task generalization.
Figure 3: Prior knowledge generation.
Figure 4: Task-conditioned denoiser.
Figure 5: Ablation study results (NYC).
...and 3 more figures

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

TL;DR

Abstract

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)