Table of Contents
Fetching ...

Concept Drift Detection using Ensemble of Integrally Private Models

Ayush K. Varshney, Vicenc Torra

TL;DR

This work tackles private concept drift detection in streaming data by introducing Integrally Private Drift Detection (IPDD), which uses an ensemble of $\Delta$-integrally private models to estimate prediction uncertainty and detect drift without ground-truth labels. Drift signals are produced through entropy-based uncertainty measures and are monitored with ADWIN, triggering label queries only when drift is detected to refresh the private model ensemble. A probabilistic theoretical analysis provides bounds on model recurrence across disjoint datasets, showing that increasing minibatch size $b$, distance $\Delta$, and the number of datasets $m$ enhances $k$-anonymity IP. Empirical results on real and synthetic datasets demonstrate that IPDD achieves competitive or superior accuracy, MCC, and AUC compared to ADWIN baselines and differentially private models, while maintaining stronger privacy properties. The approach is complemented by openly available source code for replication and further exploration of private drift detection in streaming contexts.

Abstract

Deep neural networks (DNNs) are one of the most widely used machine learning algorithm. DNNs requires the training data to be available beforehand with true labels. This is not feasible for many real-world problems where data arrives in the streaming form and acquisition of true labels are scarce and expensive. In the literature, not much focus has been given to the privacy prospect of the streaming data, where data may change its distribution frequently. These concept drifts must be detected privately in order to avoid any disclosure risk from DNNs. Existing privacy models use concept drift detection schemes such ADWIN, KSWIN to detect the drifts. In this paper, we focus on the notion of integrally private DNNs to detect concept drifts. Integrally private DNNs are the models which recur frequently from different datasets. Based on this, we introduce an ensemble methodology which we call 'Integrally Private Drift Detection' (IPDD) method to detect concept drift from private models. Our IPDD method does not require labels to detect drift but assumes true labels are available once the drift has been detected. We have experimented with binary and multi-class synthetic and real-world data. Our experimental results show that our methodology can privately detect concept drift, has comparable utility (even better in some cases) with ADWIN and outperforms utility from different levels of differentially private models. The source code for the paper is available \hyperlink{https://github.com/Ayush-Umu/Concept-drift-detection-Using-Integrally-private-models}{here}.

Concept Drift Detection using Ensemble of Integrally Private Models

TL;DR

This work tackles private concept drift detection in streaming data by introducing Integrally Private Drift Detection (IPDD), which uses an ensemble of -integrally private models to estimate prediction uncertainty and detect drift without ground-truth labels. Drift signals are produced through entropy-based uncertainty measures and are monitored with ADWIN, triggering label queries only when drift is detected to refresh the private model ensemble. A probabilistic theoretical analysis provides bounds on model recurrence across disjoint datasets, showing that increasing minibatch size , distance , and the number of datasets enhances -anonymity IP. Empirical results on real and synthetic datasets demonstrate that IPDD achieves competitive or superior accuracy, MCC, and AUC compared to ADWIN baselines and differentially private models, while maintaining stronger privacy properties. The approach is complemented by openly available source code for replication and further exploration of private drift detection in streaming contexts.

Abstract

Deep neural networks (DNNs) are one of the most widely used machine learning algorithm. DNNs requires the training data to be available beforehand with true labels. This is not feasible for many real-world problems where data arrives in the streaming form and acquisition of true labels are scarce and expensive. In the literature, not much focus has been given to the privacy prospect of the streaming data, where data may change its distribution frequently. These concept drifts must be detected privately in order to avoid any disclosure risk from DNNs. Existing privacy models use concept drift detection schemes such ADWIN, KSWIN to detect the drifts. In this paper, we focus on the notion of integrally private DNNs to detect concept drifts. Integrally private DNNs are the models which recur frequently from different datasets. Based on this, we introduce an ensemble methodology which we call 'Integrally Private Drift Detection' (IPDD) method to detect concept drift from private models. Our IPDD method does not require labels to detect drift but assumes true labels are available once the drift has been detected. We have experimented with binary and multi-class synthetic and real-world data. Our experimental results show that our methodology can privately detect concept drift, has comparable utility (even better in some cases) with ADWIN and outperforms utility from different levels of differentially private models. The source code for the paper is available \hyperlink{https://github.com/Ayush-Umu/Concept-drift-detection-Using-Integrally-private-models}{here}.
Paper Structure (9 sections, 2 theorems, 5 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 9 sections, 2 theorems, 5 equations, 7 figures, 1 table, 2 algorithms.

Key Result

theorem thmcountertheorem

If $D_1, D_2, ..., D_m$ are i.i.d samples from the dataset $\mathcal{D}$ with some distribution and $b$ is the number of minibatches used for training in each of $T$ epochs. Then under similar training environment i.e. same initialization, learning rate, etc. with probability greater than $(\sum_{r=

Figures (7)

  • Figure 1: Types of drifts in the data
  • Figure 2: Flowchart drift detection using ensemble of $\Delta$-Integrally Private Models
  • Figure 3: Two models $M_j, M_k$ at most $\Delta^2$ distance apart from $\mu$ with probability defined in Eq. (5)
  • Figure 4: Drift detected by different $\epsilon$-differentially private models
  • Figure 5: Comparison of the accuracy score between differential privacy and integral privacy: (a) CovType (b) Electricity (c) Susy (d) Sine (e) Insect_ab (f) Insect_grad (g) Insect_incr.
  • ...and 2 more figures

Theorems & Definitions (2)

  • theorem thmcountertheorem
  • theorem thmcountertheorem