Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey
Yuxuan Liang, Haomin Wen, Yutong Xia, Ming Jin, Bin Yang, Flora Salim, Qingsong Wen, Shirui Pan, Gao Cong
TL;DR
This work addresses the lack of a systematic study of Spatio-Temporal Foundation Models (STFMs) spanning the full ST data science workflow—sensing, management, mining, and downstream tasks. It provides a taxonomy contrasting Large Language Models (LLMs) and Pretrained Foundation Models (PFMs), detailing their architectures, pretraining schemes, and data modalities, and it highlights core STFMs capabilities in perception, optimization, and reasoning. The paper synthesizes mechanisms for real-world sensing, synthetic data generation, data cleaning, retrieval, and integration, as well as numerical and inferential tasks such as forecasting, event analysis, grounding, and scenario simulation. By outlining a comprehensive framework and future directions, the survey aims to catalyze scalable, adaptable STFMs that integrate across ST workflows and support intelligent, data-driven decision making in domains like urban computing and climate science.
Abstract
Spatio-Temporal (ST) data science, which includes sensing, managing, and mining large-scale data across space and time, is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Traditional deep learning approaches have significantly advanced this field, particularly in the stage of ST data mining. However, these models remain task-specific and often require extensive labeled data. Inspired by the success of Foundation Models (FM), especially large language models, researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. Unlike prior architectures, STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach. Despite rapid progress, a systematic study of STFMs for ST data science remains lacking. This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence.
