Privacy Drift: Evolving Privacy Concerns in Incremental Learning
Sayyed Farid Ahamed, Soumya Banerjee, Sandip Roy, Aayush Kapoor, Marc Vucovich, Kevin Choi, Abdul Rahman, Edward Bowen, Sachin Shetty
TL;DR
This paper introduces privacy drift, a framework paralleling concept drift to describe how private information leakage evolves during incremental training in Federated Learning. It investigates how data drift, model evolution, and attack-development dynamics influence membership inference risk, using CIFAR-20 non-IID partitions and incremental training to reveal non-monotonic privacy behavior. The study demonstrates a persistent correlation between model accuracy and privacy leakage (MIA AUC) under both centralized and federated settings, highlighting that improving performance can elevate privacy risk. The findings motivate privacy-aware strategies, such as differential privacy and secure aggregation, to balance accuracy and privacy in dynamic, decentralized learning environments.
Abstract
In the evolving landscape of machine learning (ML), Federated Learning (FL) presents a paradigm shift towards decentralized model training while preserving user data privacy. This paper introduces the concept of ``privacy drift", an innovative framework that parallels the well-known phenomenon of concept drift. While concept drift addresses the variability in model accuracy over time due to changes in the data, privacy drift encapsulates the variation in the leakage of private information as models undergo incremental training. By defining and examining privacy drift, this study aims to unveil the nuanced relationship between the evolution of model performance and the integrity of data privacy. Through rigorous experimentation, we investigate the dynamics of privacy drift in FL systems, focusing on how model updates and data distribution shifts influence the susceptibility of models to privacy attacks, such as membership inference attacks (MIA). Our results highlight a complex interplay between model accuracy and privacy safeguards, revealing that enhancements in model performance can lead to increased privacy risks. We provide empirical evidence from experiments on customized datasets derived from CIFAR-100 (Canadian Institute for Advanced Research, 100 classes), showcasing the impact of data and concept drift on privacy. This work lays the groundwork for future research on privacy-aware machine learning, aiming to achieve a delicate balance between model accuracy and data privacy in decentralized environments.
