Table of Contents
Fetching ...

Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos

Lujun Li, Lama Sleem, Yiqun Wang, Yangjie Xu, Niccolò Gentile, Radu State

TL;DR

This work introduces REAL-V-TSFM, a dataset of time series extracted from real-world videos via optical flow to probe zero-shot generalization of time-series foundation models. The authors demonstrate that state-of-the-art TSFMs, while strong on conventional benchmarks, struggle to forecast real-world motion-derived sequences, highlighting a gap between synthetic and real dynamics. They present a robust optical-flow–based extraction pipeline and an extensive analysis across multiple models, showing that real-world temporal signals pose considerable generalization challenges. The study advocates data-centric benchmarking and suggests directions for broader pretraining data and enhanced augmentation strategies to improve TSFM universality.

Abstract

Recent research on time-series foundation models (TSFMs) has underscored the scarcity of real-world data, often supplemented with synthetic sources in existing datasets, whose generalizability remains however debated. As such, in this work, we propose a novel benchmarking approach: in particular, we aim at building a curated dataset reflecting real world physical temporal dynamics, extracting temporal signals from real-world videos using optical flow. As such, we introduce REAL-V-TSFM, a novel dataset designed to capture rich and diverse time series derived from real-world videos. Experimental results on state-of-the-art TSFMs under zero-shot forecasting show that, despite strong performance on conventional benchmarks, these models exhibit performance degradation on the proposed dataset, suggesting limited generalizability to novel datasets. These findings underscore the need for novel approaches to acquiring time series data and highlight the lack of universality in recent TSFMs, while further validating the effectiveness of our video-based time series data extraction pipeline.

Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos

TL;DR

This work introduces REAL-V-TSFM, a dataset of time series extracted from real-world videos via optical flow to probe zero-shot generalization of time-series foundation models. The authors demonstrate that state-of-the-art TSFMs, while strong on conventional benchmarks, struggle to forecast real-world motion-derived sequences, highlighting a gap between synthetic and real dynamics. They present a robust optical-flow–based extraction pipeline and an extensive analysis across multiple models, showing that real-world temporal signals pose considerable generalization challenges. The study advocates data-centric benchmarking and suggests directions for broader pretraining data and enhanced augmentation strategies to improve TSFM universality.

Abstract

Recent research on time-series foundation models (TSFMs) has underscored the scarcity of real-world data, often supplemented with synthetic sources in existing datasets, whose generalizability remains however debated. As such, in this work, we propose a novel benchmarking approach: in particular, we aim at building a curated dataset reflecting real world physical temporal dynamics, extracting temporal signals from real-world videos using optical flow. As such, we introduce REAL-V-TSFM, a novel dataset designed to capture rich and diverse time series derived from real-world videos. Experimental results on state-of-the-art TSFMs under zero-shot forecasting show that, despite strong performance on conventional benchmarks, these models exhibit performance degradation on the proposed dataset, suggesting limited generalizability to novel datasets. These findings underscore the need for novel approaches to acquiring time series data and highlight the lack of universality in recent TSFMs, while further validating the effectiveness of our video-based time series data extraction pipeline.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Dataset production workflow consisting of six steps
  • Figure 2: PCA projection of the proposed dataset and the M4-Daily dataset. The color of the heatmap represents the density level, with darker colors indicating regions of higher point concentration. The contour lines (also referred to as iso-density or equal-density curves) connect points sharing the same density value.
  • Figure 3: Ten time series data extracted from real video data
  • Figure 4: Performance comparison in different objects inside REAL-V-TSFM