Table of Contents
Fetching ...

A Theoretical Analysis of Detecting Large Model-Generated Time Series

Junji Hou, Junzhou Zhao, Shuo Zhang, Pinghui Wang

TL;DR

The paper addresses detecting synthetic time series produced by Time-Series Large Models (TSLMs) and argues that text-based detectors do not translate well to time-series due to lower information density and smoother distributions. It introduces the contraction hypothesis, proving that model-generated series exhibit progressively decreasing uncertainty under recursive forecasting, and uses this insight to develop the Uncertainty Contraction Estimator (UCE), a white-box detector based on internal probability distributions. Theoretical results establish distributional consistency, variance-scaling under sampling, and recursive variance reduction, which underpin UCE’s uncertainty-based signals. Empirically, UCE outperforms state-of-the-art baselines on 32 datasets, with strong performance in both in-distribution and zero-shot settings and demonstrated cross-model generalization to Timer and Time-MoE. The work offers a principled, scalable solution for authenticating time-series data in real-world applications, with potential extensions to multivariate and batch forecasting scenarios.

Abstract

Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs) in this work. While there are extensive researches on detecting model generated text, we find that these existing methods are not applicable to time series data due to the fundamental modality difference, as time series usually have lower information density and smoother probability distributions than text data, which limit the discriminative power of token-based detectors. To address this issue, we examine the subtle distributional differences between real and model-generated time series and propose the contraction hypothesis, which states that model-generated time series, unlike real ones, exhibit progressively decreasing uncertainty under recursive forecasting. We formally prove this hypothesis under theoretical assumptions on model behavior and time series structure. Model-generated time series exhibit progressively concentrated distributions under recursive forecasting, leading to uncertainty contraction. We provide empirical validation of the hypothesis across diverse datasets. Building on this insight, we introduce the Uncertainty Contraction Estimator (UCE), a white-box detector that aggregates uncertainty metrics over successive prefixes to identify TSLM-generated time series. Extensive experiments on 32 datasets show that UCE consistently outperforms state-of-the-art baselines, offering a reliable and generalizable solution for detecting model-generated time series.

A Theoretical Analysis of Detecting Large Model-Generated Time Series

TL;DR

The paper addresses detecting synthetic time series produced by Time-Series Large Models (TSLMs) and argues that text-based detectors do not translate well to time-series due to lower information density and smoother distributions. It introduces the contraction hypothesis, proving that model-generated series exhibit progressively decreasing uncertainty under recursive forecasting, and uses this insight to develop the Uncertainty Contraction Estimator (UCE), a white-box detector based on internal probability distributions. Theoretical results establish distributional consistency, variance-scaling under sampling, and recursive variance reduction, which underpin UCE’s uncertainty-based signals. Empirically, UCE outperforms state-of-the-art baselines on 32 datasets, with strong performance in both in-distribution and zero-shot settings and demonstrated cross-model generalization to Timer and Time-MoE. The work offers a principled, scalable solution for authenticating time-series data in real-world applications, with potential extensions to multivariate and batch forecasting scenarios.

Abstract

Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs) in this work. While there are extensive researches on detecting model generated text, we find that these existing methods are not applicable to time series data due to the fundamental modality difference, as time series usually have lower information density and smoother probability distributions than text data, which limit the discriminative power of token-based detectors. To address this issue, we examine the subtle distributional differences between real and model-generated time series and propose the contraction hypothesis, which states that model-generated time series, unlike real ones, exhibit progressively decreasing uncertainty under recursive forecasting. We formally prove this hypothesis under theoretical assumptions on model behavior and time series structure. Model-generated time series exhibit progressively concentrated distributions under recursive forecasting, leading to uncertainty contraction. We provide empirical validation of the hypothesis across diverse datasets. Building on this insight, we introduce the Uncertainty Contraction Estimator (UCE), a white-box detector that aggregates uncertainty metrics over successive prefixes to identify TSLM-generated time series. Extensive experiments on 32 datasets show that UCE consistently outperforms state-of-the-art baselines, offering a reliable and generalizable solution for detecting model-generated time series.

Paper Structure

This paper contains 53 sections, 21 theorems, 69 equations, 12 figures, 6 tables.

Key Result

Lemma 4.1

For $\sigma_t^2\geq0$, we have $f_{\theta}\equiv f_{t}\; a.e.$

Figures (12)

  • Figure 1: Illustration of the variation in uncertainty for real and model-generated time series.
  • Figure 2: The empirical results show the trajectories of uncertainty metrics, including entropy (\ref{['fig:entropy_trend']}), max-probability (\ref{['fig:max_prob_trend']}) and variance (\ref{['fig:var_trend']}) of both real and model-generated time series data, illustrating reduction in uncertainty for generated data.
  • Figure 3: Comparison of the true distribution (blue) and model's internal distributions (orange). For an ideal model, its internal distribution coincides with the true distribution.
  • Figure 4: Comparison of the internal distribution (blue, from the model logits) and modified sampling distribution (orange, from 10,000-step Monte Carlo) for a single prediction step. The model’s internal probability distribution becomes sharper under our sampling strategy.
  • Figure 5: Average AUROC and TPR (at 1% FPR) for model generation detection on In-Distribution (12 datasets) and Zero-Shot (20 datasets) scenarios.
  • ...and 7 more figures

Theorems & Definitions (46)

  • Lemma 4.1
  • Corollary 4.1
  • Lemma 4.1
  • Corollary 4.2
  • Theorem 4.3
  • Definition A.1: Time Series Decomposition
  • Definition A.2: Ideal Trend Cluster of $T$
  • Definition A.3: Distinct Trends
  • Definition A.4: Ideal Dataset
  • Definition A.5: History Set
  • ...and 36 more