Fed-AugMix: Balancing Privacy and Utility via Data Augmentation
Haoyang Li, Wei Chen, Xiaojin Zhang
TL;DR
This work tackles gradient leakage in federated learning by proposing Fed-AugMix, a client-side augmentation framework that uses AugMix to inject distortion into gradients and a Jensen-Shannon divergence loss to enforce prediction consistency across augmented views. By combining AugMix with Loss Scaling, the approach achieves a favorable privacy-utility trade-off, often preserving or even improving accuracy while substantially hindering data reconstruction attacks such as InvGrad. The key contributions include implementing AugMix at the client level, integrating a JS-divergence–based loss to safeguard privacy, and validating the method across MNIST and CIFAR datasets with multiple FL baselines. The results demonstrate robust privacy protection with competitive or superior performance, highlighting the practical potential of data augmentation as a privacy-preserving mechanism in federated learning.
Abstract
Gradient leakage attacks pose a significant threat to the privacy guarantees of federated learning. While distortion-based protection mechanisms are commonly employed to mitigate this issue, they often lead to notable performance degradation. Existing methods struggle to preserve model performance while ensuring privacy. To address this challenge, we propose a novel data augmentation-based framework designed to achieve a favorable privacy-utility trade-off, with the potential to enhance model performance in certain cases. Our framework incorporates the AugMix algorithm at the client level, enabling data augmentation with controllable severity. By integrating the Jensen-Shannon divergence into the loss function, we embed the distortion introduced by AugMix into the model gradients, effectively safeguarding privacy against deep leakage attacks. Moreover, the JS divergence promotes model consistency across different augmentations of the same image, enhancing both robustness and performance. Extensive experiments on benchmark datasets demonstrate the effectiveness and stability of our method in protecting privacy. Furthermore, our approach maintains, and in some cases improves, model performance, showcasing its ability to achieve a robust privacy-utility trade-off.
