Table of Contents
Fetching ...

ConTSG-Bench: A Unified Benchmark for Conditional Time Series Generation

Shaocheng Lan, Shuqi Gu, Zhangzhi Xiong, Kan Ren

TL;DR

The ConTSG-Bench comprises a large-scale, well-aligned dataset spanning diverse conditioning modalities and levels of semantic abstraction, first enabling systematic evaluation of representative generation methods across these dimensions with a comprehensive suite of metrics for generation fidelity and condition adherence.

Abstract

Conditional time series generation plays a critical role in addressing data scarcity and enabling causal analysis in real-world applications. Despite its increasing importance, the field lacks a standardized and systematic benchmarking framework for evaluating generative models across diverse conditions. To address this gap, we introduce the Conditional Time Series Generation Benchmark (ConTSG-Bench). ConTSG-Bench comprises a large-scale, well-aligned dataset spanning diverse conditioning modalities and levels of semantic abstraction, first enabling systematic evaluation of representative generation methods across these dimensions with a comprehensive suite of metrics for generation fidelity and condition adherence. Both the quantitative benchmarking and in-depth analyses of conditional generation behaviors have revealed the traits and limitations of the current approaches, highlighting critical challenges and promising research directions, particularly with respect to precise structural controllability and downstream task utility under complex conditions.

ConTSG-Bench: A Unified Benchmark for Conditional Time Series Generation

TL;DR

The ConTSG-Bench comprises a large-scale, well-aligned dataset spanning diverse conditioning modalities and levels of semantic abstraction, first enabling systematic evaluation of representative generation methods across these dimensions with a comprehensive suite of metrics for generation fidelity and condition adherence.

Abstract

Conditional time series generation plays a critical role in addressing data scarcity and enabling causal analysis in real-world applications. Despite its increasing importance, the field lacks a standardized and systematic benchmarking framework for evaluating generative models across diverse conditions. To address this gap, we introduce the Conditional Time Series Generation Benchmark (ConTSG-Bench). ConTSG-Bench comprises a large-scale, well-aligned dataset spanning diverse conditioning modalities and levels of semantic abstraction, first enabling systematic evaluation of representative generation methods across these dimensions with a comprehensive suite of metrics for generation fidelity and condition adherence. Both the quantitative benchmarking and in-depth analyses of conditional generation behaviors have revealed the traits and limitations of the current approaches, highlighting critical challenges and promising research directions, particularly with respect to precise structural controllability and downstream task utility under complex conditions.
Paper Structure (78 sections, 22 equations, 25 figures, 32 tables, 2 algorithms)

This paper contains 78 sections, 22 equations, 25 figures, 32 tables, 2 algorithms.

Figures (25)

  • Figure 1: Conditional time series generation with varying conditioning modalities (text, attribute, class label) and semantic abstraction levels (morphological vs. conceptual).
  • Figure 2: Model ranking under two metric groups: (left) generation fidelity that evaluates marginal distribution of generated time series; (right) condition adherence that evaluates joint/conditional alignment between time series and conditions.
  • Figure 3: Morphological vs. conceptual conditioning: absolute performance. DTW and CRPS on PTB-XL and Weather under the two condition types.
  • Figure 4: Fine-grained control evaluation.Left: Joint shapelet classification accuracy on Synth-U, where all three segment-level local patterns must be correctly generated. Middle: Segment retrieval accuracy (Acc@1) as a function of candidate pool size on TelecomTS-Segment. Right: Segment--text temporal order accuracy on TelecomTS-Segment.
  • Figure 5: Compositional generalization analysis.Left: normalized retrieval accuracy for head (closest 20% to training distribution) vs. tail (farthest 20%, novel attribute combinations) test samples; points below the diagonal indicate performance degradation on out-of-distribution combinations. Right: accuracy gap (tail $-$ head) for each model, where negative values reflect sensitivity to novel attribute combinations.
  • ...and 20 more figures