Table of Contents
Fetching ...

UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction

Yuan Yuan, Jingtao Ding, Jie Feng, Depeng Jin, Yong Li

TL;DR

UniST tackles the lack of universality in urban spatio-temporal prediction by proposing a two-stage framework: large-scale pre-training over diverse spatio-temporal data and knowledge-guided prompt learning to adapt to varied patterns across scenarios. It employs a Transformer-based encoder–decoder with spatio-temporal patching, four self-supervised masking strategies, and a memory-augmented prompt learner that leverages domain knowledge (spatial closeness, hierarchy; temporal closeness, periodicity) to generate dynamic prompts for cross-dataset generalization. Empirical results across more than 20 datasets show UniST achieving state-of-the-art performance in short- and long-term predictions, with particularly strong few-shot and zero-shot capabilities, highlighting robust cross-domain transfer. The work demonstrates the practical potential of universal spatio-temporal models and suggests future integration of heterogeneous data formats (grid, sequence, graph) to further enhance universality and resilience in urban forecasting applications.

Abstract

Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive domain-specific training data. In this study, we introduce UniST, a universal model designed for general urban spatio-temporal prediction across a wide range of scenarios. Inspired by large language models, UniST achieves success through: (i) utilizing diverse spatio-temporal data from different scenarios, (ii) effective pre-training to capture complex spatio-temporal dynamics, (iii) knowledge-guided prompts to enhance generalization capabilities. These designs together unlock the potential of building a universal model for various scenarios Extensive experiments on more than 20 spatio-temporal scenarios demonstrate UniST's efficacy in advancing state-of-the-art performance, especially in few-shot and zero-shot prediction. The datasets and code implementation are released on https://github.com/tsinghua-fib-lab/UniST.

UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction

TL;DR

UniST tackles the lack of universality in urban spatio-temporal prediction by proposing a two-stage framework: large-scale pre-training over diverse spatio-temporal data and knowledge-guided prompt learning to adapt to varied patterns across scenarios. It employs a Transformer-based encoder–decoder with spatio-temporal patching, four self-supervised masking strategies, and a memory-augmented prompt learner that leverages domain knowledge (spatial closeness, hierarchy; temporal closeness, periodicity) to generate dynamic prompts for cross-dataset generalization. Empirical results across more than 20 datasets show UniST achieving state-of-the-art performance in short- and long-term predictions, with particularly strong few-shot and zero-shot capabilities, highlighting robust cross-domain transfer. The work demonstrates the practical potential of universal spatio-temporal models and suggests future integration of heterogeneous data formats (grid, sequence, graph) to further enhance universality and resilience in urban forecasting applications.

Abstract

Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive domain-specific training data. In this study, we introduce UniST, a universal model designed for general urban spatio-temporal prediction across a wide range of scenarios. Inspired by large language models, UniST achieves success through: (i) utilizing diverse spatio-temporal data from different scenarios, (ii) effective pre-training to capture complex spatio-temporal dynamics, (iii) knowledge-guided prompts to enhance generalization capabilities. These designs together unlock the potential of building a universal model for various scenarios Extensive experiments on more than 20 spatio-temporal scenarios demonstrate UniST's efficacy in advancing state-of-the-art performance, especially in few-shot and zero-shot prediction. The datasets and code implementation are released on https://github.com/tsinghua-fib-lab/UniST.
Paper Structure (42 sections, 9 equations, 13 figures, 16 tables, 2 algorithms)

This paper contains 42 sections, 9 equations, 13 figures, 16 tables, 2 algorithms.

Figures (13)

  • Figure 1: The transition from traditional separate deep learning models to a one-for-all universal model for urban spatio-temporal prediction.
  • Figure 2: The overview architecture of UniST, which consists of two stages: (i) large-scale spatio-temporal pre-trianing, (ii) spatio-temporal knowledge-guided prompt learning.
  • Figure 3: Illustration of the prompt generation process.
  • Figure 4: (a) Few-shot performance of UniST and baselines on Crowd and BikeNYC datasets using only 1% of the training data. (b) Few-shot performance of UniST and baselines using only 5% of the training data. The Dashed red lines denote the zero-shot performance of UniST.
  • Figure 5: Ablation studies on four traffic speed datasets: Chengdu (CD), Shanghai (SH), Changsha (CS), and Jinan (JN). (a) illustrates the results of removing a prompt guided by one type of spatio-temporal knowledge. (b) presents the results of varying the number of learnable embeddings in the temporal and spatial memory pools.
  • ...and 8 more figures