ProFe: Communication-Efficient Decentralized Federated Learning via Distillation and Prototypes
Pedro Miguel Sánchez Sánchez, Enrique Tomás Martínez Beltrán, Miguel Fernández Llamas, Gérôme Bovet, Gregorio Martínez Pérez, Alberto Huertas Celdrán
TL;DR
This work tackles the high communication cost of decentralized federated learning under data heterogeneity by introducing ProFe, a framework that fuses knowledge distillation from local large models, prototype-based learning for unseen classes, and 16-bit quantization to shrink data exchanged during rounds. ProFe enables a teacher-student setup for aggregation, leverages global prototypes to guide learning, and selectively reduces precision to cut bandwidth without significantly harming accuracy. Empirical results on MNIST, CIFAR10, and CIFAR100 show substantial communication savings (~40–50%) with comparable or improved performance, at the expense of a modest training-time increase (~18–20%). This makes decentralized learning more scalable in non-IID environments, balancing efficiency and accuracy for practical deployments.
Abstract
Decentralized Federated Learning (DFL) trains models in a collaborative and privacy-preserving manner while removing model centralization risks and improving communication bottlenecks. However, DFL faces challenges in efficient communication management and model aggregation within decentralized environments, especially with heterogeneous data distributions. Thus, this paper introduces ProFe, a novel communication optimization algorithm for DFL that combines knowledge distillation, prototype learning, and quantization techniques. ProFe utilizes knowledge from large local models to train smaller ones for aggregation, incorporates prototypes to better learn unseen classes, and applies quantization to reduce data transmitted during communication rounds. The performance of ProFe has been validated and compared to the literature by using benchmark datasets like MNIST, CIFAR10, and CIFAR100. Results showed that the proposed algorithm reduces communication costs by up to ~40-50% while maintaining or improving model performance. In addition, it adds ~20% training time due to increased complexity, generating a trade-off.
