Table of Contents
Fetching ...

A method to benchmark high-dimensional process drift detection

Edgar Wolf, Tobias Windisch

TL;DR

This work addresses benchmarking drift detection for high‑dimensional process curves by introducing a controllable data-generation framework based on moving support points in curve space, enabling ground-truth drift segmentation. It defines a temporal performance metric, TAUC, (and a soft variant, sTAUC) that evaluates detectors by integrating false positives against temporal overlap with true drift segments, capturing temporal context that traditional AUC misses. The authors provide a modular detector pipeline and demonstrate a small benchmark showing autoencoder‑based and multivariate tests improve TAUC, while many existing methods falter with multiple drift segments. The driftbench framework and TAUC/SOLS metrics offer a reproducible, scalable basis for developing and comparing drift detectors with practical impact for manufacturing and quality control.

Abstract

Process curves are multivariate finite time series data coming from manufacturing processes. This paper studies machine learning that detect drifts in process curve datasets. A theoretic framework to synthetically generate process curves in a controlled way is introduced in order to benchmark machine learning algorithms for process drift detection. An evaluation score, called the temporal area under the curve, is introduced, which allows to quantify how well machine learning models unveil curves belonging to drift segments. Finally, a benchmark study comparing popular machine learning approaches on synthetic data generated with the introduced framework is presented that shows that existing algorithms often struggle with datasets containing multiple drift segments.

A method to benchmark high-dimensional process drift detection

TL;DR

This work addresses benchmarking drift detection for high‑dimensional process curves by introducing a controllable data-generation framework based on moving support points in curve space, enabling ground-truth drift segmentation. It defines a temporal performance metric, TAUC, (and a soft variant, sTAUC) that evaluates detectors by integrating false positives against temporal overlap with true drift segments, capturing temporal context that traditional AUC misses. The authors provide a modular detector pipeline and demonstrate a small benchmark showing autoencoder‑based and multivariate tests improve TAUC, while many existing methods falter with multiple drift segments. The driftbench framework and TAUC/SOLS metrics offer a reproducible, scalable basis for developing and comparing drift detectors with practical impact for manufacturing and quality control.

Abstract

Process curves are multivariate finite time series data coming from manufacturing processes. This paper studies machine learning that detect drifts in process curve datasets. A theoretic framework to synthetically generate process curves in a controlled way is introduced in order to benchmark machine learning algorithms for process drift detection. An evaluation score, called the temporal area under the curve, is introduced, which allows to quantify how well machine learning models unveil curves belonging to drift segments. Finally, a benchmark study comparing popular machine learning approaches on synthetic data generated with the introduced framework is presented that shows that existing algorithms often struggle with datasets containing multiple drift segments.
Paper Structure (20 sections, 14 equations, 23 figures, 2 algorithms)

This paper contains 20 sections, 14 equations, 23 figures, 2 algorithms.

Figures (23)

  • Figure 1: Overview of different kinds of time series data from manufacturing processes and drifts within.
  • Figure 2: Samples from a process curve (left) as well as a sequence of curve samples (right).
  • Figure 3: Visualization of the data synthetization given a function $f(w, x)=\sum_{i=0}^5w_i\cdot x^i$. Left figure shows $f(w,\cdot)$ solved for concrete $x^i,y^i$ (red points). Right figure shows sequence $f(w_1, \cdot),\ldots, f(w_{100},\cdot)$ where gaussian noise was added on one coordinate in $y^1(t)$ (corresponding coordinate $x^1(t)$ is marked with a dashed line).
  • Figure 4: Short overview of our notation.
  • Figure 5: Applying a process drift detector on each process curves yields a score $s$ which needs to be compared to the ground truth $\mathcal{D}$ for each threshold $\tau$.
  • ...and 18 more figures

Theorems & Definitions (5)

  • Definition 2.1: Support points
  • Remark 3.1: Multivariate data
  • Remark 3.2: Profile data
  • Definition 4.1: Drift segments
  • Example 4.2