Quantifying Quality of Class-Conditional Generative Models in Time-Series Domain
Alireza Koochali, Maria Walch, Sankrutyayan Thota, Peter Schichtel, Andreas Dengel, Sheraz Ahmed
TL;DR
This work tackles the absence of standardized evaluation for class-conditional time-series generative models by introducing ITS and FITD, two metrics inspired by Inception Score and FID and grounded in an InceptionTime classifier. The authors validate these metrics on 80 UCR datasets and compare them with TSTR and TRTS, demonstrating that ITS and FITD, especially when used with TSTR, effectively detect quality, mode drop, and mode collapse phenomena. ITS leverages conditional vs. marginal label distributions, while FITD captures distributional similarity in learned feature space, offering complementary perspectives. The study provides a practical, discriminator-based framework for assessing time-series generative models with potential impact on healthcare, weather forecasting, and fault detection where data are scarce or imbalanced.
Abstract
Generative models are designed to address the data scarcity problem. Even with the exploding amount of data, due to computational advancements, some applications (e.g., health care, weather forecast, fault detection) still suffer from data insufficiency, especially in the time-series domain. Thus generative models are essential and powerful tools, but they still lack a consensual approach for quality assessment. Such deficiency hinders the confident application of modern implicit generative models on time-series data. Inspired by assessment methods on the image domain, we introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain. We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics alongside two existing evaluation metrics: Train on Synthetic Test on Real (TSTR) and Train on Real Test on Synthetic (TRTS). Extensive evaluation reveals that the proposed assessment method, i.e., ITS and FITD in combination with TSTR, can accurately assess class-conditional generative model performance.
