Table of Contents
Fetching ...

datadriftR: An R Package for Concept Drift Detection in Predictive Models

Ugur Dar, Mustafa Cavus

TL;DR

Concept drift, especially changes in the X–Y relationship, undermines predictive performance in evolving data streams. The paper presents datadriftR and Profile Drift Detection (PDD), a PDP-based, explainable drift detector designed for real-time monitoring in MLOps, combining three PDP-based metrics to detect drift and reveal underlying causes. PDD leverages partial dependence profiles to compare train–test relationships via $PDI$, $L_2$, and $L_{2\text{der}}$, retraining models when drift is detected and offering PDP visualizations to aid interpretation. Compared with conventional detectors like KSWIN and EDDM, PDD provides a balanced trade-off between accuracy and drift sensitivity, delivering actionable explanations while controlling false positives across diverse synthetic and real-world datasets. The work contributes an R package that unifies multiple drift detectors with PDP-based explainability, discusses limitations such as computational cost and multi-class applicability, and proposes future directions including compression-based PDP computation to scale to larger, more complex streams.

Abstract

Predictive models often face performance degradation due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or variable distributions, which may fail to capture subtle but significant conceptual changes. This paper introduces drifter, an R package designed to detect concept drift, and proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift by leveraging an explainable AI tool - Partial Dependence Profiles (PDPs). The PDD method, central to the package, quantifies changes in PDPs through novel metrics, ensuring sensitivity to shifts in the data stream without excessive computational costs. This approach aligns with MLOps practices, emphasizing model monitoring and adaptive retraining in dynamic environments. The experiments across synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high accuracy while effectively balancing sensitivity and stability. The results highlight its capability to adaptively retrain models in dynamic environments, making it a robust tool for real-time applications. The paper concludes by discussing the advantages, limitations, and future extensions of the package for broader use cases.

datadriftR: An R Package for Concept Drift Detection in Predictive Models

TL;DR

Concept drift, especially changes in the X–Y relationship, undermines predictive performance in evolving data streams. The paper presents datadriftR and Profile Drift Detection (PDD), a PDP-based, explainable drift detector designed for real-time monitoring in MLOps, combining three PDP-based metrics to detect drift and reveal underlying causes. PDD leverages partial dependence profiles to compare train–test relationships via , , and , retraining models when drift is detected and offering PDP visualizations to aid interpretation. Compared with conventional detectors like KSWIN and EDDM, PDD provides a balanced trade-off between accuracy and drift sensitivity, delivering actionable explanations while controlling false positives across diverse synthetic and real-world datasets. The work contributes an R package that unifies multiple drift detectors with PDP-based explainability, discusses limitations such as computational cost and multi-class applicability, and proposes future directions including compression-based PDP computation to scale to larger, more complex streams.

Abstract

Predictive models often face performance degradation due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or variable distributions, which may fail to capture subtle but significant conceptual changes. This paper introduces drifter, an R package designed to detect concept drift, and proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift by leveraging an explainable AI tool - Partial Dependence Profiles (PDPs). The PDD method, central to the package, quantifies changes in PDPs through novel metrics, ensuring sensitivity to shifts in the data stream without excessive computational costs. This approach aligns with MLOps practices, emphasizing model monitoring and adaptive retraining in dynamic environments. The experiments across synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high accuracy while effectively balancing sensitivity and stability. The results highlight its capability to adaptively retrain models in dynamic environments, making it a robust tool for real-time applications. The paper concludes by discussing the advantages, limitations, and future extensions of the package for broader use cases.

Paper Structure

This paper contains 17 sections, 14 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: The predictive modeling process
  • Figure 2: The workflow of the Profile Drift Detection
  • Figure 3: PDPs of the Logistic Regression model trained on SEA dataset
  • Figure 4: PDPs of the Decision Tree model trained on SEA dataset
  • Figure 5: PDPs of the Random Forest model trained on SEA dataset
  • ...and 1 more figures