The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness
Mengyao Du, Miao Zhang, Yuwen Pu, Kai Xu, Shouling Ji, Quanjun Yin
TL;DR
Federated fine-tuning on privacy-sensitive, domain-specific data can skew feature representations and degrade out-of-distribution robustness. The authors introduce three robustness indicators—$SVE$, $LSVR$, and $GDA$—and a general noisy projection-based algorithm (GNP) that transfers robustness from the pre-trained model to the fine-tuned model while augmenting capacity via Gaussian noise. Through experiments on multiple robust NLP datasets and several PEFT methods, they show that data heterogeneity and choice of PEFT can undermine OOD robustness, and that GNP consistently improves robustness without sacrificing in-distribution performance. The proposed framework offers a practical, general approach to preserving OOD robustness in federated, parameter-efficient fine-tuning regimes, with broad relevance to real-world applications in NLP under privacy constraints.
Abstract
To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators and conducting experiments across diverse robust datasets, we elucidate these phenomena by scrutinizing the diversity, transferability, and deviation within the model feature space. To mitigate the negative impact of federated learning on model robustness, we introduce GNP, a \underline{G}eneral \underline{N}oisy \underline{P}rojection-based robust algorithm, ensuring no deterioration of accuracy on the target distribution. Specifically, the key strategy for enhancing model robustness entails the transfer of robustness from the pre-trained model to the fine-tuned model, coupled with adding a small amount of Gaussian noise to augment the representative capacity of the model. Comprehensive experimental results demonstrate that our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods and confronting different levels of data heterogeneity.
