Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities
Adam Goodge, Wee Siong Ng, Bryan Hooi, See Kiong Ng
TL;DR
Spatio-temporal foundation models (STFMs) aim to learn universal patterns from diverse spatio-temporal data to generalize across tasks. The paper defines four generalization axes—domain, spatial, temporal, and scale—and assesses current STFMs against these ideals, identifying fragmentation between transportation- and weather-focused work and limited cross-domain evaluation. It outlines opportunities for progress through unified architectures, cross-domain synergies, multi-modal training, and adaptation to distribution shift, providing a roadmap for developing broadly applicable STFMs. The work highlights the potential of STFMs to enable robust, scalable spatio-temporal reasoning across domains such as urban systems, climate, and public health, with practical implications for forecasting, monitoring, and decision support.
Abstract
Foundation models have revolutionized artificial intelligence, setting new benchmarks in performance and enabling transformative capabilities across a wide range of vision and language tasks. However, despite the prevalence of spatio-temporal data in critical domains such as transportation, public health, and environmental monitoring, spatio-temporal foundation models (STFMs) have not yet achieved comparable success. In this paper, we articulate a vision for the future of STFMs, outlining their essential characteristics and the generalization capabilities necessary for broad applicability. We critically assess the current state of research, identifying gaps relative to these ideal traits, and highlight key challenges that impede their progress. Finally, we explore potential opportunities and directions to advance research towards the aim of effective and broadly applicable STFMs.
