Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets
Lars Böcking, Leopold Müller, Niklas Kühl
TL;DR
The paper tackles privacy concerns in algorithm selection for time series classification by introducing data fingerprints that summarize datasets without revealing data points. It reframes algorithm selection as a multi-target regression problem that maps dataset fingerprints to expected performance and uncertainty, enabling predictions on unseen data. Through extensive evaluation on 112 UCR datasets across 35 algorithms, the method achieves meaningful improvements over a naive baseline in both mean performance and uncertainty estimation, demonstrating a practical, privacy-preserving approach for AI service deployment. This work offers a scalable framework for informed AS in time series tasks, with potential extensions to additional objectives and interpretability for real-world service systems.
Abstract
The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.
