A method to benchmark high-dimensional process drift detection

Edgar Wolf; Tobias Windisch

A method to benchmark high-dimensional process drift detection

Edgar Wolf, Tobias Windisch

TL;DR

This work addresses benchmarking drift detection for high‑dimensional process curves by introducing a controllable data-generation framework based on moving support points in curve space, enabling ground-truth drift segmentation. It defines a temporal performance metric, TAUC, (and a soft variant, sTAUC) that evaluates detectors by integrating false positives against temporal overlap with true drift segments, capturing temporal context that traditional AUC misses. The authors provide a modular detector pipeline and demonstrate a small benchmark showing autoencoder‑based and multivariate tests improve TAUC, while many existing methods falter with multiple drift segments. The driftbench framework and TAUC/SOLS metrics offer a reproducible, scalable basis for developing and comparing drift detectors with practical impact for manufacturing and quality control.

Abstract

Process curves are multivariate finite time series data coming from manufacturing processes. This paper studies machine learning that detect drifts in process curve datasets. A theoretic framework to synthetically generate process curves in a controlled way is introduced in order to benchmark machine learning algorithms for process drift detection. An evaluation score, called the temporal area under the curve, is introduced, which allows to quantify how well machine learning models unveil curves belonging to drift segments. Finally, a benchmark study comparing popular machine learning approaches on synthetic data generated with the introduced framework is presented that shows that existing algorithms often struggle with datasets containing multiple drift segments.

A method to benchmark high-dimensional process drift detection

TL;DR

Abstract

Paper Structure (20 sections, 14 equations, 23 figures, 2 algorithms)

This paper contains 20 sections, 14 equations, 23 figures, 2 algorithms.

Introduction
Statistical framework to model process drifts
Data generation
The temporal area under the curve
Experiments
Algorithms
Feature extraction
Windowing and aggregation
Score computing
Algorithm Overview
Datasets
Results
Conclusion
Predictions of detectors of benchmark study
Data generation with polynomials
...and 5 more sections

Figures (23)

Figure 1: Overview of different kinds of time series data from manufacturing processes and drifts within.
Figure 2: Samples from a process curve (left) as well as a sequence of curve samples (right).
Figure 3: Visualization of the data synthetization given a function $f(w, x)=\sum_{i=0}^5w_i\cdot x^i$. Left figure shows $f(w,\cdot)$ solved for concrete $x^i,y^i$ (red points). Right figure shows sequence $f(w_1, \cdot),\ldots, f(w_{100},\cdot)$ where gaussian noise was added on one coordinate in $y^1(t)$ (corresponding coordinate $x^1(t)$ is marked with a dashed line).
Figure 4: Short overview of our notation.
Figure 5: Applying a process drift detector on each process curves yields a score $s$ which needs to be compared to the ground truth $\mathcal{D}$ for each threshold $\tau$.
...and 18 more figures

Theorems & Definitions (5)

Definition 2.1: Support points
Remark 3.1: Multivariate data
Remark 3.2: Profile data
Definition 4.1: Drift segments
Example 4.2

A method to benchmark high-dimensional process drift detection

TL;DR

Abstract

A method to benchmark high-dimensional process drift detection

Authors

TL;DR

Abstract

Table of Contents

Figures (23)

Theorems & Definitions (5)