Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos
Lujun Li, Lama Sleem, Yiqun Wang, Yangjie Xu, Niccolò Gentile, Radu State
TL;DR
This work introduces REAL-V-TSFM, a dataset of time series extracted from real-world videos via optical flow to probe zero-shot generalization of time-series foundation models. The authors demonstrate that state-of-the-art TSFMs, while strong on conventional benchmarks, struggle to forecast real-world motion-derived sequences, highlighting a gap between synthetic and real dynamics. They present a robust optical-flow–based extraction pipeline and an extensive analysis across multiple models, showing that real-world temporal signals pose considerable generalization challenges. The study advocates data-centric benchmarking and suggests directions for broader pretraining data and enhanced augmentation strategies to improve TSFM universality.
Abstract
Recent research on time-series foundation models (TSFMs) has underscored the scarcity of real-world data, often supplemented with synthetic sources in existing datasets, whose generalizability remains however debated. As such, in this work, we propose a novel benchmarking approach: in particular, we aim at building a curated dataset reflecting real world physical temporal dynamics, extracting temporal signals from real-world videos using optical flow. As such, we introduce REAL-V-TSFM, a novel dataset designed to capture rich and diverse time series derived from real-world videos. Experimental results on state-of-the-art TSFMs under zero-shot forecasting show that, despite strong performance on conventional benchmarks, these models exhibit performance degradation on the proposed dataset, suggesting limited generalizability to novel datasets. These findings underscore the need for novel approaches to acquiring time series data and highlight the lack of universality in recent TSFMs, while further validating the effectiveness of our video-based time series data extraction pipeline.
