Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models
Yifeng Cai, Ziqi Zhang, Ding Li, Yao Guo, Xiangqun Chen
TL;DR
The paper tackles the problem of federated learning on heterogeneous devices by challenging the partial-model aggregation paradigm and introducing Moss, a full-weight aggregation framework. Moss uses a proxy-model construction (PROM), weight-wise knowledge transfer (WIRE), and fidelity-guided aggregation (FILE) to achieve effective cross-architecture knowledge transfer and convergence. Through extensive experiments on image classification, speech recognition, and HAR, Moss demonstrates up to substantial gains in accuracy, a ~63% reduction in training time on devices, and major reductions in energy use and network transmission compared to state-of-the-art baselines, while maintaining privacy. The approach also proves robust to unrelated public data and differential privacy constraints, suggesting practical applicability in real-world heterogeneous FL deployments and meaningful impact on mobile/IoT AI services.
Abstract
Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.
