Table of Contents
Fetching ...

Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey

Yuxuan Liang, Haomin Wen, Yutong Xia, Ming Jin, Bin Yang, Flora Salim, Qingsong Wen, Shirui Pan, Gao Cong

TL;DR

This work addresses the lack of a systematic study of Spatio-Temporal Foundation Models (STFMs) spanning the full ST data science workflow—sensing, management, mining, and downstream tasks. It provides a taxonomy contrasting Large Language Models (LLMs) and Pretrained Foundation Models (PFMs), detailing their architectures, pretraining schemes, and data modalities, and it highlights core STFMs capabilities in perception, optimization, and reasoning. The paper synthesizes mechanisms for real-world sensing, synthetic data generation, data cleaning, retrieval, and integration, as well as numerical and inferential tasks such as forecasting, event analysis, grounding, and scenario simulation. By outlining a comprehensive framework and future directions, the survey aims to catalyze scalable, adaptable STFMs that integrate across ST workflows and support intelligent, data-driven decision making in domains like urban computing and climate science.

Abstract

Spatio-Temporal (ST) data science, which includes sensing, managing, and mining large-scale data across space and time, is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Traditional deep learning approaches have significantly advanced this field, particularly in the stage of ST data mining. However, these models remain task-specific and often require extensive labeled data. Inspired by the success of Foundation Models (FM), especially large language models, researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. Unlike prior architectures, STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach. Despite rapid progress, a systematic study of STFMs for ST data science remains lacking. This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence.

Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey

TL;DR

This work addresses the lack of a systematic study of Spatio-Temporal Foundation Models (STFMs) spanning the full ST data science workflow—sensing, management, mining, and downstream tasks. It provides a taxonomy contrasting Large Language Models (LLMs) and Pretrained Foundation Models (PFMs), detailing their architectures, pretraining schemes, and data modalities, and it highlights core STFMs capabilities in perception, optimization, and reasoning. The paper synthesizes mechanisms for real-world sensing, synthetic data generation, data cleaning, retrieval, and integration, as well as numerical and inferential tasks such as forecasting, event analysis, grounding, and scenario simulation. By outlining a comprehensive framework and future directions, the survey aims to catalyze scalable, adaptable STFMs that integrate across ST workflows and support intelligent, data-driven decision making in domains like urban computing and climate science.

Abstract

Spatio-Temporal (ST) data science, which includes sensing, managing, and mining large-scale data across space and time, is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Traditional deep learning approaches have significantly advanced this field, particularly in the stage of ST data mining. However, these models remain task-specific and often require extensive labeled data. Inspired by the success of Foundation Models (FM), especially large language models, researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. Unlike prior architectures, STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach. Despite rapid progress, a systematic study of STFMs for ST data science remains lacking. This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence.

Paper Structure

This paper contains 33 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: ST Foundation Models (STFM), which include LLM and PFM, are pretrained with or applied to diverse ST data, with the abilities of perception, optimization, and reasoning. STFMs can, in turn, enhance each stage of ST data science.
  • Figure 2: Illustration of various types of ST data.
  • Figure 3: The framework of STFMs for ST data science.
  • Figure 4: STFMs for addressing inferential problems.
  • Figure 5: A method-centric taxonomy. Full version: Fig. \ref{['fig:taxonomy_full']}.
  • ...and 2 more figures