Table of Contents
Fetching ...

Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models

Yifeng Cai, Ziqi Zhang, Ding Li, Yao Guo, Xiangqun Chen

TL;DR

The paper tackles the problem of federated learning on heterogeneous devices by challenging the partial-model aggregation paradigm and introducing Moss, a full-weight aggregation framework. Moss uses a proxy-model construction (PROM), weight-wise knowledge transfer (WIRE), and fidelity-guided aggregation (FILE) to achieve effective cross-architecture knowledge transfer and convergence. Through extensive experiments on image classification, speech recognition, and HAR, Moss demonstrates up to substantial gains in accuracy, a ~63% reduction in training time on devices, and major reductions in energy use and network transmission compared to state-of-the-art baselines, while maintaining privacy. The approach also proves robust to unrelated public data and differential privacy constraints, suggesting practical applicability in real-world heterogeneous FL deployments and meaningful impact on mobile/IoT AI services.

Abstract

Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.

Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models

TL;DR

The paper tackles the problem of federated learning on heterogeneous devices by challenging the partial-model aggregation paradigm and introducing Moss, a full-weight aggregation framework. Moss uses a proxy-model construction (PROM), weight-wise knowledge transfer (WIRE), and fidelity-guided aggregation (FILE) to achieve effective cross-architecture knowledge transfer and convergence. Through extensive experiments on image classification, speech recognition, and HAR, Moss demonstrates up to substantial gains in accuracy, a ~63% reduction in training time on devices, and major reductions in energy use and network transmission compared to state-of-the-art baselines, while maintaining privacy. The approach also proves robust to unrelated public data and differential privacy constraints, suggesting practical applicability in real-world heterogeneous FL deployments and meaningful impact on mobile/IoT AI services.

Abstract

Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.

Paper Structure

This paper contains 41 sections, 10 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Illustration of pruning-based solutions and distillation-based solutions.
  • Figure 2: Framework of Moss.
  • Figure 3: Design of WIRE. The width of the arrows represents the value of the transfer location/degree.
  • Figure 4: Comparison of FL rounds to achieve convergence.
  • Figure 5: Comparison of the total time to complete FL for the devices.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 1