Adaptive Model Pruning and Personalization for Federated Learning over Wireless Networks
Xiaonan Liu, Tharmalingam Ratnarajah, Mathini Sellathurai, Yonina C. Eldar
TL;DR
This work addresses the latency and accuracy challenges of federated learning over wireless networks with heterogeneous data. It proposes a principled framework that partitions the model into a global pruned part and device-specific personalized parts, enabling reduced computation and communication while mitigating non IID data effects. A convergence analysis provides an upper bound on gradient norms and demonstrates how pruning influences performance, while KKT-based optimization yields closed-form solutions for pruning ratios and bandwidth allocations under latency constraints. Experiments on CNNs with non IID datasets show that the proposed approach achieves comparable learning accuracy to full personalization while cutting computation and communication latency by about 50%, highlighting its practical impact for resource-constrained edge deployments.
Abstract
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy. However, the learning accuracy decreases due to the heterogeneity of devices' data, and the computation and communication latency increase when updating large-scale learning models on devices with limited computational capability and wireless resources. We consider a FL framework with partial model pruning and personalization to overcome these challenges. This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device, which adapts the model size during FL to reduce both computation and communication latency and increases the learning accuracy for devices with non-independent and identically distributed data. The computation and communication latency and convergence of the proposed FL framework are mathematically analyzed. To maximize the convergence rate and guarantee learning accuracy, Karush Kuhn Tucker (KKT) conditions are deployed to jointly optimize the pruning ratio and bandwidth allocation. Finally, experimental results demonstrate that the proposed FL framework achieves a remarkable reduction of approximately 50 percent computation and communication latency compared with FL with partial model personalization.
