A Theoretical Analysis of Detecting Large Model-Generated Time Series
Junji Hou, Junzhou Zhao, Shuo Zhang, Pinghui Wang
TL;DR
The paper addresses detecting synthetic time series produced by Time-Series Large Models (TSLMs) and argues that text-based detectors do not translate well to time-series due to lower information density and smoother distributions. It introduces the contraction hypothesis, proving that model-generated series exhibit progressively decreasing uncertainty under recursive forecasting, and uses this insight to develop the Uncertainty Contraction Estimator (UCE), a white-box detector based on internal probability distributions. Theoretical results establish distributional consistency, variance-scaling under sampling, and recursive variance reduction, which underpin UCE’s uncertainty-based signals. Empirically, UCE outperforms state-of-the-art baselines on 32 datasets, with strong performance in both in-distribution and zero-shot settings and demonstrated cross-model generalization to Timer and Time-MoE. The work offers a principled, scalable solution for authenticating time-series data in real-world applications, with potential extensions to multivariate and batch forecasting scenarios.
Abstract
Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs) in this work. While there are extensive researches on detecting model generated text, we find that these existing methods are not applicable to time series data due to the fundamental modality difference, as time series usually have lower information density and smoother probability distributions than text data, which limit the discriminative power of token-based detectors. To address this issue, we examine the subtle distributional differences between real and model-generated time series and propose the contraction hypothesis, which states that model-generated time series, unlike real ones, exhibit progressively decreasing uncertainty under recursive forecasting. We formally prove this hypothesis under theoretical assumptions on model behavior and time series structure. Model-generated time series exhibit progressively concentrated distributions under recursive forecasting, leading to uncertainty contraction. We provide empirical validation of the hypothesis across diverse datasets. Building on this insight, we introduce the Uncertainty Contraction Estimator (UCE), a white-box detector that aggregates uncertainty metrics over successive prefixes to identify TSLM-generated time series. Extensive experiments on 32 datasets show that UCE consistently outperforms state-of-the-art baselines, offering a reliable and generalizable solution for detecting model-generated time series.
