FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

Jungwon Seo; Minhoe Kim; Chunming Rong

FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

Jungwon Seo, Minhoe Kim, Chunming Rong

TL;DR

Federated Learning faces high communication overhead, and mixed-precision aggregation can introduce quantization bias and client drift. The authors propose FedShift, a weight-shifting aggregation that uses the non-quantized reference to align mixed-precision updates by shifting inferior client weights via a mean term $m^{t+1}$ resulting in $\hat{\mathbf{w}}^{t+1}=\mathbf{w}^{t+1}-\frac{I}{K}m^{t+1}\mathbf{1}_P$. FedShift is designed as a lightweight server-side add-on compatible with existing FL optimization algorithms, and the paper provides convergence and divergence analyses under non-IID data. Empirical results on CIFAR-10 with mixed precision show FedShift improves accuracy across bit-widths, reduces label bias and model drift, and can outperform full-precision FedAvg in some settings.

Abstract

Federated Learning (FL) commonly relies on a central server to coordinate training across distributed clients. While effective, this paradigm suffers from significant communication overhead, impacting overall training efficiency. To mitigate this, prior work has explored compression techniques such as quantization. However, in heterogeneous FL settings, clients may employ different quantization levels based on their hardware or network constraints, necessitating a mixed-precision aggregation process at the server. This introduces additional challenges, exacerbating client drift and leading to performance degradation. In this work, we propose FedShift, a novel aggregation methodology designed to mitigate performance degradation in FL scenarios with mixed quantization levels. FedShift employs a statistical matching mechanism based on weight shifting to align mixed-precision models, thereby reducing model divergence and addressing quantization-induced bias. Our approach functions as an add-on to existing FL optimization algorithms, enhancing their robustness and improving convergence. Empirical results demonstrate that FedShift effectively mitigates the negative impact of mixed-precision aggregation, yielding superior performance across various FL benchmarks.

FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

TL;DR

resulting in

. FedShift is designed as a lightweight server-side add-on compatible with existing FL optimization algorithms, and the paper provides convergence and divergence analyses under non-IID data. Empirical results on CIFAR-10 with mixed precision show FedShift improves accuracy across bit-widths, reduces label bias and model drift, and can outperform full-precision FedAvg in some settings.

Abstract

Paper Structure (31 sections, 5 theorems, 38 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 5 theorems, 38 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Related Works
Problem Setup
Federated Learning
System Model
Weight Quantization
Proposed Method: FedShift
FedShift - Weight Shifting Aggregation
Theoritical Analysis
Experiment and Result
Experimental Setup
Model Architecture
Dataset and Data Partitioning
Training Configuration
Quantization Setting
...and 16 more sections

Key Result

Theorem 1

Under the Assumptions AS1 to AS5 regarding $L$-smoothness & $\mu$-convexity of function and bounded gradients and weight average, for the choice of $\beta = \frac{2}{\mu}$, $\gamma = \max\{ \frac{8L}{\mu}, E\}-1$, $\varepsilon > 0$, then $\eta_\tau=\frac{2}{\mu}\frac{1}{\gamma+\tau}$, the error afte where $B_1$, $B_2$ and $M$ are values determined by bounds, $F^*$ is the minimum point of function

Figures (10)

Figure 1: Training flow with inferior group incorporating quantization process (mixed-precision)
Figure 2: Illustration of quantization and dequantization process with various bit widths
Figure 3: The effect of 4-bit quantization techniques on the weight distribution over epochs during a single neural network training process.
Figure 4: Concept visualization of FedShift
Figure 5: Illustration of FedShift balancing divergence
...and 5 more figures

Theorems & Definitions (10)

Theorem 1
proof
Theorem 2
proof
Lemma 1
Lemma 2
Lemma 3
proof
proof
proof

FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

TL;DR

Abstract

FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (10)