Table of Contents
Fetching ...

WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

Alexander Stepikin, Evgenia Romanenkova, Alexey Zaytsev

TL;DR

This work tackles change point detection in high-dimensional data by moving beyond standalone neural detectors to a task-specific ensemble aggregation. It introduces WWAggr, a sliding-window, 1-Wasserstein distance-based aggregation that preserves ensemble disagreement signals and reduces reliance on precise threshold tuning. Coupled with post-hoc beta calibration, the approach enables robust online inference with thresholds near $h \approx 0.5$. On Yahoo, Explosions, and Road Accidents data, WWAggr consistently surpasses naive aggregations, achieving up to about a 20% improvement in $F_1$ and establishing new state-of-the-art performance in challenging settings.

Abstract

Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors have yet to achieve perfect quality. Concurrently, ensembling provides more robust solutions, boosting the performance. In this paper, we investigate ensembles of deep change point detectors and realize that standard prediction aggregation techniques, e.g., averaging, are suboptimal and fail to account for problem peculiarities. Alternatively, we introduce WWAggr -- a novel task-specific method of ensemble aggregation based on the Wasserstein distance. Our procedure is versatile, working effectively with various ensembles of deep CPD models. Moreover, unlike existing solutions, we practically lift a long-standing problem of the decision threshold selection for CPD.

WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

TL;DR

This work tackles change point detection in high-dimensional data by moving beyond standalone neural detectors to a task-specific ensemble aggregation. It introduces WWAggr, a sliding-window, 1-Wasserstein distance-based aggregation that preserves ensemble disagreement signals and reduces reliance on precise threshold tuning. Coupled with post-hoc beta calibration, the approach enables robust online inference with thresholds near . On Yahoo, Explosions, and Road Accidents data, WWAggr consistently surpasses naive aggregations, achieving up to about a 20% improvement in and establishing new state-of-the-art performance in challenging settings.

Abstract

Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors have yet to achieve perfect quality. Concurrently, ensembling provides more robust solutions, boosting the performance. In this paper, we investigate ensembles of deep change point detectors and realize that standard prediction aggregation techniques, e.g., averaging, are suboptimal and fail to account for problem peculiarities. Alternatively, we introduce WWAggr -- a novel task-specific method of ensemble aggregation based on the Wasserstein distance. Our procedure is versatile, working effectively with various ensembles of deep CPD models. Moreover, unlike existing solutions, we practically lift a long-standing problem of the decision threshold selection for CPD.

Paper Structure

This paper contains 34 sections, 5 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Teaser of our new aggregation procedure for effective high-dimensional change point detection. First, an ensemble of deep change point (CP) detectors predicts well-calibrated CP scores for each moment. Second, these scores are aggregated via WWAggr --- our sliding-window Wasserstein procedure. Applied to the calibrated scores, WWAggr better reflects changes in ensemble predictions, yielding near-optimal performance for the pre-defined threshold of $0.5$.
  • Figure 2: Dependence of the $F_1$-score on threshold selection for the WWAggr procedure before and after model calibration. Dashed line indicates the best score obtained by searching through $300$ thresholds.
  • Figure 3: Histograms of the mean predicted "normal" and "abnormal" CP scores for ensembles of supervised BCE models trained on the Explosions dataset. $W_{1}$ represents the 1-Wasserstein distance estimate between these two distributions.
  • Figure 4: Dependence of the $F_1$-score on threshold selection for our aggregation procedure with different probabilistic distances. Results for the experiments with calibrated BCE ensembles on the video datasets. Dashed line indicates the best score obtained by searching through $300$ thresholds.