Table of Contents
Fetching ...

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

Fengze Sun, Egemen Tanin, Shanika Karunasekera, Zuqing Li, Flora D. Salim, Jianzhong Qi

TL;DR

UrbanVerse tackles the challenge of generalizing urban region representations across cities and tasks by introducing two integrated modules: CELearning, which learns transferable grid-cell embeddings through random-walk sequences and transformer-based encoding to produce region embeddings, and HCondDiffCT, a heterogeneous conditional diffusion framework that jointly models multiple downstream tasks with region priors and task semantics. The approach shifts from city-centric graphs to region-centric representations, enabling cross-city transfer, while the diffusion-based cross-task learner captures task-dependent distributions and uncertainties. Empirical results on three real-world cities show UrbanVerse achieving up to 35.89% improvements in $R^2$ across six tasks in cross-city settings, with additional strong gains when integrating HCondDiffCT into existing models and in cross-country/suburban scenarios. The work demonstrates practical impact by enabling a foundation-style urban analytics model capable of transferring knowledge across cities and tasks, and by providing uncertainty-aware predictions useful for decision making.

Abstract

Recent advances in urban region representation learning have enabled a wide range of applications in urban analytics, yet existing methods remain limited in their capabilities to generalize across cities and analytic tasks. We aim to generalize urban representation learning beyond city- and task-specific settings, towards a foundation-style model for urban analytics. To this end, we propose UrbanVerse, a model for cross-city urban representation learning and cross-task urban analytics. For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city. We model regions as nodes on a graph, which enables a random walk-based procedure to form "sequences of regions" that reflect both local and neighborhood structural features for urban region representation learning. For cross-task generalization, we propose a cross-task learning module named HCondDiffCT. This module integrates region-conditioned prior knowledge and task-conditioned semantics into the diffusion process to jointly model multiple downstream urban prediction tasks. HCondDiffCT is generic. It can also be integrated with existing urban representation learning models to enhance their downstream task effectiveness. Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings, achieving up to 35.89% improvements in prediction accuracy.

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

TL;DR

UrbanVerse tackles the challenge of generalizing urban region representations across cities and tasks by introducing two integrated modules: CELearning, which learns transferable grid-cell embeddings through random-walk sequences and transformer-based encoding to produce region embeddings, and HCondDiffCT, a heterogeneous conditional diffusion framework that jointly models multiple downstream tasks with region priors and task semantics. The approach shifts from city-centric graphs to region-centric representations, enabling cross-city transfer, while the diffusion-based cross-task learner captures task-dependent distributions and uncertainties. Empirical results on three real-world cities show UrbanVerse achieving up to 35.89% improvements in across six tasks in cross-city settings, with additional strong gains when integrating HCondDiffCT into existing models and in cross-country/suburban scenarios. The work demonstrates practical impact by enabling a foundation-style urban analytics model capable of transferring knowledge across cities and tasks, and by providing uncertainty-aware predictions useful for decision making.

Abstract

Recent advances in urban region representation learning have enabled a wide range of applications in urban analytics, yet existing methods remain limited in their capabilities to generalize across cities and analytic tasks. We aim to generalize urban representation learning beyond city- and task-specific settings, towards a foundation-style model for urban analytics. To this end, we propose UrbanVerse, a model for cross-city urban representation learning and cross-task urban analytics. For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city. We model regions as nodes on a graph, which enables a random walk-based procedure to form "sequences of regions" that reflect both local and neighborhood structural features for urban region representation learning. For cross-task generalization, we propose a cross-task learning module named HCondDiffCT. This module integrates region-conditioned prior knowledge and task-conditioned semantics into the diffusion process to jointly model multiple downstream urban prediction tasks. HCondDiffCT is generic. It can also be integrated with existing urban representation learning models to enhance their downstream task effectiveness. Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings, achieving up to 35.89% improvements in prediction accuracy.
Paper Structure (45 sections, 32 equations, 8 figures, 17 tables, 2 algorithms)

This paper contains 45 sections, 32 equations, 8 figures, 17 tables, 2 algorithms.

Figures (8)

  • Figure 1: Region representation learning frameworks.
  • Figure 2: UrbanVerse model overview. The model supports cross-city urban representation learning and cross-task urban analytics through two components: (1) CELearning takes a set $C$ of grid cells and first learns cell embeddings $\mathbf{E}$ via cell sequences formed by random walks over a graph of cells. It then aggregates cell embeddings to generate region embeddings $\mathbf{H}$ across multiple cities, facilitating cross-city generalization. (2) HCondDiffCT jointly models multiple tasks within a diffusion process to achieve cross-task generalization.
  • Figure 3: Prior knowledge generation.
  • Figure 4: Task-conditioned denoiser.
  • Figure 5: Ablation study results (NYC).
  • ...and 3 more figures