Table of Contents
Fetching ...

Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis

Elias Werner, Nishant Kumar, Matthias Lieber, Sunna Torge, Stefan Gumhold, Wolfgang E. Nagel

TL;DR

This paper argues that unsupervised concept drift detectors must be evaluated for computational performance in addition to inference quality, particularly for large-scale and real-time applications. It introduces a performance engineering framework comprising complexity analysis, benchmarking design, and performance analysis, and provides time/space complexities for a set of detectors while demonstrating HPC-style profiling on two detectors (IKS and STUDD) using a large dataset. The authors propose benchmark design considerations—dataset diversity, model dependency, baselines, and metrics—to enable robust, resource-aware evaluation and highlight the need for reproducible implementations. Overall, the work lays out a practical path toward resource-efficient, scalable unsupervised drift detection by coupling theoretical complexity with empirical performance analysis using HPC tools like Score-P and Vampir.

Abstract

Concept drift detection is crucial for many AI systems to ensure the system's reliability. These systems often have to deal with large amounts of data or react in real-time. Thus, drift detectors must meet computational requirements or constraints with a comprehensive performance evaluation. However, so far, the focus of developing drift detectors is on inference quality, e.g. accuracy, but not on computational performance, such as runtime. Many of the previous works consider computational performance only as a secondary objective and do not have a benchmark for such evaluation. Hence, we propose and explain performance engineering for unsupervised concept drift detection that reflects on computational complexities, benchmarking, and performance analysis. We provide the computational complexities of existing unsupervised drift detectors and discuss why further computational performance investigations are required. Hence, we state and substantiate the aspects of a benchmark for unsupervised drift detection reflecting on inference quality and computational performance. Furthermore, we demonstrate performance analysis practices that have proven their effectiveness in High-Performance Computing, by tracing two drift detectors and displaying their performance data.

Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis

TL;DR

This paper argues that unsupervised concept drift detectors must be evaluated for computational performance in addition to inference quality, particularly for large-scale and real-time applications. It introduces a performance engineering framework comprising complexity analysis, benchmarking design, and performance analysis, and provides time/space complexities for a set of detectors while demonstrating HPC-style profiling on two detectors (IKS and STUDD) using a large dataset. The authors propose benchmark design considerations—dataset diversity, model dependency, baselines, and metrics—to enable robust, resource-aware evaluation and highlight the need for reproducible implementations. Overall, the work lays out a practical path toward resource-efficient, scalable unsupervised drift detection by coupling theoretical complexity with empirical performance analysis using HPC tools like Score-P and Vampir.

Abstract

Concept drift detection is crucial for many AI systems to ensure the system's reliability. These systems often have to deal with large amounts of data or react in real-time. Thus, drift detectors must meet computational requirements or constraints with a comprehensive performance evaluation. However, so far, the focus of developing drift detectors is on inference quality, e.g. accuracy, but not on computational performance, such as runtime. Many of the previous works consider computational performance only as a secondary objective and do not have a benchmark for such evaluation. Hence, we propose and explain performance engineering for unsupervised concept drift detection that reflects on computational complexities, benchmarking, and performance analysis. We provide the computational complexities of existing unsupervised drift detectors and discuss why further computational performance investigations are required. Hence, we state and substantiate the aspects of a benchmark for unsupervised drift detection reflecting on inference quality and computational performance. Furthermore, we demonstrate performance analysis practices that have proven their effectiveness in High-Performance Computing, by tracing two drift detectors and displaying their performance data.
Paper Structure (12 sections, 1 equation, 4 figures, 1 table)

This paper contains 12 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Accuracy, amount of labels, runtime, and peak memory of the pipelines IKS, STUDD, BL1, and BL2 on Forest Covertype dataset.
  • Figure 2: Accuracy and runtime behavior while processing streams over different datasets. The X-axis indicates the number of samples after which re-training is conducted. The Y-axis (right/blue) shows runtime and Y-axis (left/green) shows accuracy. Gas, Electricity, Abrupt Insects where initialized with $n=1$ and sampled in steps of 5, Forest Covertype, Airlines were initialized with $n=20$ and sampled in steps of 100.
  • Figure 3: Vampir display of the IKS (left) and STUDD (right) performance data on the Forest Covertype dataset. In each display, the left part shows the call tree and the right part shows the function summary, i.e. accumulated exclusive time per function. User regions for the inference process are yellow, user regions for the drift detection are green, user regions for maintaining buffers, and the model is red. sklearn functions are blue. The remaining functions are shaded grey.
  • Figure 4: Vampir display of the IKS with timeline feature, i.e. chronological sequence of the called functions. We selected the time window to show the called functions of the IKS per sample, i.e. between inference steps (yellow). The top display shows the master timeline and the bottom display the call stack. Treap.SplitKeepRight is light green, Treap.KeepGreatest cyan, Treap.KeepSmallest orange, Treap.Merge dark blue and other Treap functions are purple.