Concept Drift Detection using Ensemble of Integrally Private Models
Ayush K. Varshney, Vicenc Torra
TL;DR
This work tackles private concept drift detection in streaming data by introducing Integrally Private Drift Detection (IPDD), which uses an ensemble of $\Delta$-integrally private models to estimate prediction uncertainty and detect drift without ground-truth labels. Drift signals are produced through entropy-based uncertainty measures and are monitored with ADWIN, triggering label queries only when drift is detected to refresh the private model ensemble. A probabilistic theoretical analysis provides bounds on model recurrence across disjoint datasets, showing that increasing minibatch size $b$, distance $\Delta$, and the number of datasets $m$ enhances $k$-anonymity IP. Empirical results on real and synthetic datasets demonstrate that IPDD achieves competitive or superior accuracy, MCC, and AUC compared to ADWIN baselines and differentially private models, while maintaining stronger privacy properties. The approach is complemented by openly available source code for replication and further exploration of private drift detection in streaming contexts.
Abstract
Deep neural networks (DNNs) are one of the most widely used machine learning algorithm. DNNs requires the training data to be available beforehand with true labels. This is not feasible for many real-world problems where data arrives in the streaming form and acquisition of true labels are scarce and expensive. In the literature, not much focus has been given to the privacy prospect of the streaming data, where data may change its distribution frequently. These concept drifts must be detected privately in order to avoid any disclosure risk from DNNs. Existing privacy models use concept drift detection schemes such ADWIN, KSWIN to detect the drifts. In this paper, we focus on the notion of integrally private DNNs to detect concept drifts. Integrally private DNNs are the models which recur frequently from different datasets. Based on this, we introduce an ensemble methodology which we call 'Integrally Private Drift Detection' (IPDD) method to detect concept drift from private models. Our IPDD method does not require labels to detect drift but assumes true labels are available once the drift has been detected. We have experimented with binary and multi-class synthetic and real-world data. Our experimental results show that our methodology can privately detect concept drift, has comparable utility (even better in some cases) with ADWIN and outperforms utility from different levels of differentially private models. The source code for the paper is available \hyperlink{https://github.com/Ayush-Umu/Concept-drift-detection-Using-Integrally-private-models}{here}.
