FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning
Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou
TL;DR
FedSelect addresses the challenge of data heterogeneity in personalized federated learning by jointly personalizing client subnetworks and weights. It introduces GradLTN, a gradient-based lottery-ticket method that identifies a subnetwork to fine-tune locally while freezing the rest for global aggregation, and LocalAlt to perform alternating updates guided by the discovered masks. The approach achieves state-of-the-art mean accuracies on CIFAR-10 in a low-client, full-participation setting, with higher personalization rates $p$ generally reducing communication while preserving global knowledge. These results suggest that fine-grained parameter-level personalization, rather than layer-wise personalization, better preserves global knowledge and adapts to local distributions. The work opens avenues for applying parameter-level subnetworks to other non-IID FL benchmarks and exploring broader datasets.
Abstract
Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes both client subnetwork structure and parameters, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation during training. We show that this method achieves promising results on CIFAR-10.
