Table of Contents
Fetching ...

OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User

Sarah Alnegheimish, Laure Berti-Equille, Kalyan Veeramachaneni

TL;DR

This work proposes OrionBench– an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models, which provides universal abstractions to represent models, extensibility to add new pipelines and datasets, hyperparameter standardization, pipeline verification, and frequent releases with published updates of the benchmark.

Abstract

Time series anomaly detection is a vital task in many domains, including patient monitoring in healthcare, forecasting in finance, and predictive maintenance in energy industries. This has led to a proliferation of anomaly detection methods, including deep learning-based methods. Benchmarks are essential for comparing the performances of these models as they emerge, in a fair, rigorous, and reproducible approach. Although several benchmarks for comparing models have been proposed, these usually rely on a one-time execution over a limited set of datasets, with comparisons restricted to a few models. We propose OrionBench: an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models. Our framework provides universal abstractions to represent models, hyperparameter standardization, extensibility to add new pipelines and datasets, pipeline verification, and frequent releases with published updates of the benchmark. We demonstrate how to use OrionBench, and the performance of pipelines across 17 releases published over the course of four years. We also walk through two real scenarios we experienced with OrionBench that highlight the importance of continuous benchmarking for unsupervised time series anomaly detection.

OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User

TL;DR

This work proposes OrionBench– an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models, which provides universal abstractions to represent models, extensibility to add new pipelines and datasets, hyperparameter standardization, pipeline verification, and frequent releases with published updates of the benchmark.

Abstract

Time series anomaly detection is a vital task in many domains, including patient monitoring in healthcare, forecasting in finance, and predictive maintenance in energy industries. This has led to a proliferation of anomaly detection methods, including deep learning-based methods. Benchmarks are essential for comparing the performances of these models as they emerge, in a fair, rigorous, and reproducible approach. Although several benchmarks for comparing models have been proposed, these usually rely on a one-time execution over a limited set of datasets, with comparisons restricted to a few models. We propose OrionBench: an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models. Our framework provides universal abstractions to represent models, hyperparameter standardization, extensibility to add new pipelines and datasets, pipeline verification, and frequent releases with published updates of the benchmark. We demonstrate how to use OrionBench, and the performance of pipelines across 17 releases published over the course of four years. We also walk through two real scenarios we experienced with OrionBench that highlight the importance of continuous benchmarking for unsupervised time series anomaly detection.
Paper Structure (29 sections, 1 equation, 13 figures, 22 tables)

This paper contains 29 sections, 1 equation, 13 figures, 22 tables.

Figures (13)

  • Figure 1: Typically researchers and end-users have independent processes. Researchers develop their method and benchmark it to publish their papers. Once these methods are publicized, end-users work on first understanding the model then adapting the code to work on their own data. After it is tested, end-users decide whether the performance is sufficient for it to be deployed or not. With OrionBench, we aim to have a single hub where researchers can benchmark their pipelines and become instantaneously available to end-users.
  • Figure 2: OrionBench integrates new models made by ML researchers and compares its performance to currently available models through the leaderboard. After testing the validity and reproduciblity of the model, it is transferred from "sandbox" to "verified" and becomes readily available to the end-user.
  • Figure 3: Example of LSMT DT pipeline. (a) Graph representation of the pipeline showcasing its primitives and data flow. (b) python usage example. (c) Subset of hyperparameter configuration in json format of the pipeline.
  • Figure 4: Distribution of F1 Scores across NASA, NAB, Yahoo S5, and UCR. Yahoo S5 was split into two subsets highlighting the F1 difference pipelines experience when detecting point anomalies.
  • Figure 5: Benchmark command in Python. Running benchmark() with default settings will execute the benchmark on all pipelines and datasets currently integrated.
  • ...and 8 more figures