Table of Contents
Fetching ...

Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

Lars Böcking, Leopold Müller, Niklas Kühl

TL;DR

The paper tackles privacy concerns in algorithm selection for time series classification by introducing data fingerprints that summarize datasets without revealing data points. It reframes algorithm selection as a multi-target regression problem that maps dataset fingerprints to expected performance and uncertainty, enabling predictions on unseen data. Through extensive evaluation on 112 UCR datasets across 35 algorithms, the method achieves meaningful improvements over a naive baseline in both mean performance and uncertainty estimation, demonstrating a practical, privacy-preserving approach for AI service deployment. This work offers a scalable framework for informed AS in time series tasks, with potential extensions to additional objectives and interpretability for real-world service systems.

Abstract

The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.

Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

TL;DR

The paper tackles privacy concerns in algorithm selection for time series classification by introducing data fingerprints that summarize datasets without revealing data points. It reframes algorithm selection as a multi-target regression problem that maps dataset fingerprints to expected performance and uncertainty, enabling predictions on unseen data. Through extensive evaluation on 112 UCR datasets across 35 algorithms, the method achieves meaningful improvements over a naive baseline in both mean performance and uncertainty estimation, demonstrating a practical, privacy-preserving approach for AI service deployment. This work offers a scalable framework for informed AS in time series tasks, with potential extensions to additional objectives and interpretability for real-world service systems.

Abstract

The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.
Paper Structure (14 sections, 7 figures, 1 table)

This paper contains 14 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Approach for performance estimation in time series classification. Inspired by amini2020deep.
  • Figure 2: Problem statement: Mapping the time series instances to the algorithm performance. \ref{['fig:yoga_instances']}: Ten sampled time series instances of each target class and their averages in the Yoga-dataset. \ref{['fig:yoga_performance']}: Histogram of the classification performance of different algorithms on the Yoga-dataset across 30 cross validation folds. Achieved accuracy on the x-axis and density on the y-axis.
  • Figure 3: Approach step-wise aggregation function.
  • Figure 4: Instance fingerprint $f_I(x^{i,d})$ for ten sampled instances of each class in the Yoga-dataset and the class aggregation $f_C(.)$ via $\mu$ aggregation. As indicated by their different values, Skewness $\gamma_{1}$ and Kurtosis $Kurt[X]$ are promising characteristics to differentiate individual instances as well as aggregated class fingerprints.
  • Figure 5: Ridge regression estimations $\widehat{\mu(\mathbb{E}_{h}^{d})}$ for $h$ 1NN-DTW, achieving an average improvement in MAE of $18.13\%$ on $D_{val}$ and $18.61\%$ on $D_{test}$ compared to $\ddot{\mu}_{h}^d$.
  • ...and 2 more figures