Table of Contents
Fetching ...

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko

TL;DR

This work proposes CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions, and reveals that CauKer-generated datasets exhibit clear scaling laws for both dataset size and model capacity, unlike real-world datasets, which display irregular scaling behavior.

Abstract

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

TL;DR

This work proposes CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions, and reveals that CauKer-generated datasets exhibit clear scaling laws for both dataset size and model capacity, unlike real-world datasets, which display irregular scaling behavior.

Abstract

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.

Paper Structure

This paper contains 66 sections, 6 equations, 13 figures, 13 tables, 1 algorithm.

Figures (13)

  • Figure 1: An illustration of the proposed CauKer pipeline. Kernels sampled from the kernel bank $\mathcal{K}$ are randomly combined and used together with sampled mean functions to form GP priors. Time series sampled from these GP priors act as root nodes in a directed acyclic graph that encodes causal dependencies between nodes. Each edge of this graph applies an activation function from a predefined activation function bank and aggregates over incoming edges using a random linear transformation to propagate transformed time series through the graph. Intermediate node outputs are optionally interpolated to fixed length, forming the final synthetic dataset. This procedure yields rich, diverse, and causally consistent time series for self-supervised pre-training.
  • Figure 2: Clustering structure of CauKer generated dataset with 200 time series.
  • Figure 3: Scaling law of MOMENT and Mantis depending on the dataset size (left, middle left, respectively) model trained on different subsets of UEA and CauK datasets. Scaling law for the same models depending on the model size (middle right, right, respectively)
  • Figure 4: Mantis embeddings of $100\text{K}$ time series drawn from UCR, UEA and generated by CauKer.
  • Figure 5: (Top row) Non-linearity statistics of the Mantis models pre-trained on CauKer synthetic datasets of varying size (left) compared to UEA (right); (Bottom row) CKA similarities calculated across the hidden layers of the pre-trained models.
  • ...and 8 more figures