UrbanVerse: Learning Urban Region Representation Across Cities and Tasks
Fengze Sun, Egemen Tanin, Shanika Karunasekera, Zuqing Li, Flora D. Salim, Jianzhong Qi
TL;DR
UrbanVerse tackles the challenge of generalizing urban region representations across cities and tasks by introducing two integrated modules: CELearning, which learns transferable grid-cell embeddings through random-walk sequences and transformer-based encoding to produce region embeddings, and HCondDiffCT, a heterogeneous conditional diffusion framework that jointly models multiple downstream tasks with region priors and task semantics. The approach shifts from city-centric graphs to region-centric representations, enabling cross-city transfer, while the diffusion-based cross-task learner captures task-dependent distributions and uncertainties. Empirical results on three real-world cities show UrbanVerse achieving up to 35.89% improvements in $R^2$ across six tasks in cross-city settings, with additional strong gains when integrating HCondDiffCT into existing models and in cross-country/suburban scenarios. The work demonstrates practical impact by enabling a foundation-style urban analytics model capable of transferring knowledge across cities and tasks, and by providing uncertainty-aware predictions useful for decision making.
Abstract
Recent advances in urban region representation learning have enabled a wide range of applications in urban analytics, yet existing methods remain limited in their capabilities to generalize across cities and analytic tasks. We aim to generalize urban representation learning beyond city- and task-specific settings, towards a foundation-style model for urban analytics. To this end, we propose UrbanVerse, a model for cross-city urban representation learning and cross-task urban analytics. For cross-city generalization, UrbanVerse focuses on features local to the target regions and structural features of the nearby regions rather than the entire city. We model regions as nodes on a graph, which enables a random walk-based procedure to form "sequences of regions" that reflect both local and neighborhood structural features for urban region representation learning. For cross-task generalization, we propose a cross-task learning module named HCondDiffCT. This module integrates region-conditioned prior knowledge and task-conditioned semantics into the diffusion process to jointly model multiple downstream urban prediction tasks. HCondDiffCT is generic. It can also be integrated with existing urban representation learning models to enhance their downstream task effectiveness. Experiments on real-world datasets show that UrbanVerse consistently outperforms state-of-the-art methods across six tasks under cross-city settings, achieving up to 35.89% improvements in prediction accuracy.
