CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

Shifeng Xie; Vasilii Feofanov; Ambroise Odonnat; Lei Zan; Marius Alonso; Jianfeng Zhang; Themis Palpanas; Lujia Pan; Keli Zhang; Ievgen Redko

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko

TL;DR

This work proposes CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions, and reveals that CauKer-generated datasets exhibit clear scaling laws for both dataset size and model capacity, unlike real-world datasets, which display irregular scaling behavior.

Abstract

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

TL;DR

Abstract

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)