Table of Contents
Fetching ...

All Cities are Equal: A Unified Human Mobility Generation Model Enabled by LLMs

Bo Liu, Tong Li, Zhu Xiao, Ruihui Li, Geyong Min, Zhuo Tang, Kenli Li

TL;DR

UniMob tackles the challenge of generating high-fidelity synthetic human mobility across cities with varying data resources. It combines an LLM-based travel planner to infer temporally-aware and semantically meaningful travel plans, a unified spatial embedding to map diverse city layouts into a shared representation, and a diffusion-based mobility generator conditioned on the inferred plans. Across two real-world datasets spanning five cities, UniMob outperforms state-of-the-art baselines by over 30% on multiple spatiotemporal metrics, and demonstrates robust zero-shot and few-shot generalization, privacy preservation, and utility for downstream tasks. The work enables equitable, cross-city mobility synthesis with practical impact on urban analytics, planning, and resource allocation, particularly in data-scarce environments.

Abstract

Synthetic human mobility generation is gaining traction as an ethical and practical approach to supporting the data needs of intelligent urban systems. Existing methods perform well primarily in data-rich cities, while their effectiveness declines significantly in cities with limited data resources. However, the ability to generate reliable human mobility data should not depend on a city's size or available resources, all cities deserve equal consideration. To address this open issue, we propose UniMob, a unified human mobility generation model across cities. UniMob is composed of three main components: an LLM-powered travel planner that derives high-level, temporally-aware, and semantically meaningful travel plans; a unified spatial embedding module that projects the spatial regions of various cities into a shared representation space; and a diffusion-based mobility generator that captures the joint spatiotemporal characteristics of human movement, guided by the derived travel plans. We evaluate UniMob extensively using two real-world datasets covering five cities. Comprehensive experiments show that UniMob significantly outperforms state-of-the-art baselines, achieving improvements of over 30\% across multiple evaluation metrics. Further analysis demonstrates UniMob's robustness in both zero- and few-shot scenarios, underlines the importance of LLM guidance, verifies its privacy-preserving nature, and showcases its applicability for downstream tasks.

All Cities are Equal: A Unified Human Mobility Generation Model Enabled by LLMs

TL;DR

UniMob tackles the challenge of generating high-fidelity synthetic human mobility across cities with varying data resources. It combines an LLM-based travel planner to infer temporally-aware and semantically meaningful travel plans, a unified spatial embedding to map diverse city layouts into a shared representation, and a diffusion-based mobility generator conditioned on the inferred plans. Across two real-world datasets spanning five cities, UniMob outperforms state-of-the-art baselines by over 30% on multiple spatiotemporal metrics, and demonstrates robust zero-shot and few-shot generalization, privacy preservation, and utility for downstream tasks. The work enables equitable, cross-city mobility synthesis with practical impact on urban analytics, planning, and resource allocation, particularly in data-scarce environments.

Abstract

Synthetic human mobility generation is gaining traction as an ethical and practical approach to supporting the data needs of intelligent urban systems. Existing methods perform well primarily in data-rich cities, while their effectiveness declines significantly in cities with limited data resources. However, the ability to generate reliable human mobility data should not depend on a city's size or available resources, all cities deserve equal consideration. To address this open issue, we propose UniMob, a unified human mobility generation model across cities. UniMob is composed of three main components: an LLM-powered travel planner that derives high-level, temporally-aware, and semantically meaningful travel plans; a unified spatial embedding module that projects the spatial regions of various cities into a shared representation space; and a diffusion-based mobility generator that captures the joint spatiotemporal characteristics of human movement, guided by the derived travel plans. We evaluate UniMob extensively using two real-world datasets covering five cities. Comprehensive experiments show that UniMob significantly outperforms state-of-the-art baselines, achieving improvements of over 30\% across multiple evaluation metrics. Further analysis demonstrates UniMob's robustness in both zero- and few-shot scenarios, underlines the importance of LLM guidance, verifies its privacy-preserving nature, and showcases its applicability for downstream tasks.
Paper Structure (22 sections, 12 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 12 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Visualization of POI transitions analysis based on mobile phone datasets from Beijing and Shanghai. Similar transition patterns from departure (left) to destination(right) in Beijing and Shanghai can be observed.
  • Figure 2: Framework overview of UniMob. Based on a start time $t_0$ and POI distribution of a start region$\mathbf{p}_{r_0}$, the travel planner uses a fine-tuned LLM to infer temporal and semantic travel plans driven by individual intent. Unified spatial embedding maps diverse spatial structures into unified space, enabling human mobility generation across cities. Mobility generator synthesizes human mobility through a denoising process conditioned on travel plans.
  • Figure 3: Example of inferring travel plans using fine-tuned LLM. The prompt is structured with task instruction, current time, and origin POI distribution, and the LLM generates temporal and semantic travel plans structured with next arrival time and destination POI distribution.
  • Figure 4: Description of the unified spatial embedding. In this encoder-decoder symmetrical architecture, a spatial-temporal encoder embeds diverse spatial structures in regional mobility sequences into unified spatial representations, and a lightweight decoder maps these representations back onto specific spatial identifiers. When generalizing to a new city, the encoder is frozen and the decoder is trained on a small subset of labeled samples.
  • Figure 5: Diffusion Transformer (DiT) Block. Conditions, including time step $t$, start region $r_0$, semantic travel plan $D$ and temporal travel plan $D$, are first embedded into $t'$,$r_0'$,$D'$ and $M'$. By progressively denoising the noised latent, DiT blocks generate unified spatial representations.
  • ...and 5 more figures