Table of Contents
Fetching ...

Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities

Adam Goodge, Wee Siong Ng, Bryan Hooi, See Kiong Ng

TL;DR

Spatio-temporal foundation models (STFMs) aim to learn universal patterns from diverse spatio-temporal data to generalize across tasks. The paper defines four generalization axes—domain, spatial, temporal, and scale—and assesses current STFMs against these ideals, identifying fragmentation between transportation- and weather-focused work and limited cross-domain evaluation. It outlines opportunities for progress through unified architectures, cross-domain synergies, multi-modal training, and adaptation to distribution shift, providing a roadmap for developing broadly applicable STFMs. The work highlights the potential of STFMs to enable robust, scalable spatio-temporal reasoning across domains such as urban systems, climate, and public health, with practical implications for forecasting, monitoring, and decision support.

Abstract

Foundation models have revolutionized artificial intelligence, setting new benchmarks in performance and enabling transformative capabilities across a wide range of vision and language tasks. However, despite the prevalence of spatio-temporal data in critical domains such as transportation, public health, and environmental monitoring, spatio-temporal foundation models (STFMs) have not yet achieved comparable success. In this paper, we articulate a vision for the future of STFMs, outlining their essential characteristics and the generalization capabilities necessary for broad applicability. We critically assess the current state of research, identifying gaps relative to these ideal traits, and highlight key challenges that impede their progress. Finally, we explore potential opportunities and directions to advance research towards the aim of effective and broadly applicable STFMs.

Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities

TL;DR

Spatio-temporal foundation models (STFMs) aim to learn universal patterns from diverse spatio-temporal data to generalize across tasks. The paper defines four generalization axes—domain, spatial, temporal, and scale—and assesses current STFMs against these ideals, identifying fragmentation between transportation- and weather-focused work and limited cross-domain evaluation. It outlines opportunities for progress through unified architectures, cross-domain synergies, multi-modal training, and adaptation to distribution shift, providing a roadmap for developing broadly applicable STFMs. The work highlights the potential of STFMs to enable robust, scalable spatio-temporal reasoning across domains such as urban systems, climate, and public health, with practical implications for forecasting, monitoring, and decision support.

Abstract

Foundation models have revolutionized artificial intelligence, setting new benchmarks in performance and enabling transformative capabilities across a wide range of vision and language tasks. However, despite the prevalence of spatio-temporal data in critical domains such as transportation, public health, and environmental monitoring, spatio-temporal foundation models (STFMs) have not yet achieved comparable success. In this paper, we articulate a vision for the future of STFMs, outlining their essential characteristics and the generalization capabilities necessary for broad applicability. We critically assess the current state of research, identifying gaps relative to these ideal traits, and highlight key challenges that impede their progress. Finally, we explore potential opportunities and directions to advance research towards the aim of effective and broadly applicable STFMs.
Paper Structure (26 sections, 4 figures, 1 table)

This paper contains 26 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The four types of spatio-temporal data (raster, point reference, trajectory and events), with example use cases and illustrations.
  • Figure 2: A framework for spatio-temporal foundation models (STFMs). Top-left: STFMs can flexibly handle various forms of ST data as input (see Section \ref{['sec:data']}). Middle-left: Novel techniques for training STFMs to handle multiple domains. Bottom-left: Various forms of complementary information in a variety of modalities can be incorporated into STFMs as guidance to perform specific tasks. Top-right: the generalization capabilities expected of STFMs (see Section \ref{['sec:gc']}. Middle-right: A variety of relevant applications for STFMs. Bottom-right: Diverse types of tasks that STFMs should perform.
  • Figure 3: Four forms of generalization in spatio-temporal data. Top-left: domain generalization across different sources of data representing different physical systems and categories of applications. Top-right: spatial generalization: across different locations or regions in space. Bottom-left: temporal generalization across different periods and intervals of time. Bottom-right: scale generalization across different resolutions, frequencies or granularities of data.
  • Figure 4: A selection of spatio-temporal domains that STFMs should be able to generalize across. Dashed lines indicate examples of potential for correlations or relationships in data patterns between different domains.

Theorems & Definitions (2)

  • Definition 3.1
  • Definition 3.2