Table of Contents
Fetching ...

Quantifying Quality of Class-Conditional Generative Models in Time-Series Domain

Alireza Koochali, Maria Walch, Sankrutyayan Thota, Peter Schichtel, Andreas Dengel, Sheraz Ahmed

TL;DR

This work tackles the absence of standardized evaluation for class-conditional time-series generative models by introducing ITS and FITD, two metrics inspired by Inception Score and FID and grounded in an InceptionTime classifier. The authors validate these metrics on 80 UCR datasets and compare them with TSTR and TRTS, demonstrating that ITS and FITD, especially when used with TSTR, effectively detect quality, mode drop, and mode collapse phenomena. ITS leverages conditional vs. marginal label distributions, while FITD captures distributional similarity in learned feature space, offering complementary perspectives. The study provides a practical, discriminator-based framework for assessing time-series generative models with potential impact on healthcare, weather forecasting, and fault detection where data are scarce or imbalanced.

Abstract

Generative models are designed to address the data scarcity problem. Even with the exploding amount of data, due to computational advancements, some applications (e.g., health care, weather forecast, fault detection) still suffer from data insufficiency, especially in the time-series domain. Thus generative models are essential and powerful tools, but they still lack a consensual approach for quality assessment. Such deficiency hinders the confident application of modern implicit generative models on time-series data. Inspired by assessment methods on the image domain, we introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain. We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics alongside two existing evaluation metrics: Train on Synthetic Test on Real (TSTR) and Train on Real Test on Synthetic (TRTS). Extensive evaluation reveals that the proposed assessment method, i.e., ITS and FITD in combination with TSTR, can accurately assess class-conditional generative model performance.

Quantifying Quality of Class-Conditional Generative Models in Time-Series Domain

TL;DR

This work tackles the absence of standardized evaluation for class-conditional time-series generative models by introducing ITS and FITD, two metrics inspired by Inception Score and FID and grounded in an InceptionTime classifier. The authors validate these metrics on 80 UCR datasets and compare them with TSTR and TRTS, demonstrating that ITS and FITD, especially when used with TSTR, effectively detect quality, mode drop, and mode collapse phenomena. ITS leverages conditional vs. marginal label distributions, while FITD captures distributional similarity in learned feature space, offering complementary perspectives. The study provides a practical, discriminator-based framework for assessing time-series generative models with potential impact on healthcare, weather forecasting, and fault detection where data are scarce or imbalanced.

Abstract

Generative models are designed to address the data scarcity problem. Even with the exploding amount of data, due to computational advancements, some applications (e.g., health care, weather forecast, fault detection) still suffer from data insufficiency, especially in the time-series domain. Thus generative models are essential and powerful tools, but they still lack a consensual approach for quality assessment. Such deficiency hinders the confident application of modern implicit generative models on time-series data. Inspired by assessment methods on the image domain, we introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain. We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics alongside two existing evaluation metrics: Train on Synthetic Test on Real (TSTR) and Train on Real Test on Synthetic (TRTS). Extensive evaluation reveals that the proposed assessment method, i.e., ITS and FITD in combination with TSTR, can accurately assess class-conditional generative model performance.
Paper Structure (19 sections, 4 equations, 17 figures, 2 tables)

This paper contains 19 sections, 4 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: The proposed evaluation pipeline for $FITD$ and $ITS$.
  • Figure 2: Changes in the scores when data quality is declined by introducing noise into data progressively.
  • Figure 3: The comparison between original and noisy data from the Chinatown dataset. Due to the large scale of data, the introduction of noise with $\sigma = 5$ does not change the data significantly to cause a response in our scores.
  • Figure 4: Relative $ITS$ and $FITD$ score when one mode is dropped from a dataset.
  • Figure 6: Relative $ITS$ and $FITD$ score for extreme mode drop scenario.
  • ...and 12 more figures