Table of Contents
Fetching ...

Extra Clients at No Extra Cost: Overcome Data Heterogeneity in Federated Learning with Filter Decomposition

Wei Chen, Qiang Qiu

TL;DR

This work tackles data heterogeneity in federated learning by introducing filter decomposition of convolutional layers into filter atoms $\boldsymbol{D}$ and atom coefficients $\boldsymbol{\alpha}$, enabling many latent local model variants without extra cost. Global aggregation becomes a reconstruction step after averaging $\boldsymbol{\alpha}$ and $\boldsymbol{D}$ separately, producing a global model $\boldsymbol{\theta}_{\boldsymbol{\phi}} = \boldsymbol{\alpha} \times \mathbf{D}$ that effectively expands the ensemble of participating clients. The authors provide variance-reduction and convergence analyses and demonstrate consistent accuracy gains across FL baselines on CIFAR-10/100 and Tiny-ImageNet, along with improved personalization and communication efficiency via a fast/slow update scheme. The approach is straightforward to integrate with existing FL methods and offers flexible personalization and communication schedules, with notable improvements especially in challenging datasets like Tiny-ImageNet.

Abstract

Data heterogeneity is one of the major challenges in federated learning (FL), which results in substantial client variance and slow convergence. In this study, we propose a novel solution: decomposing a convolutional filter in FL into a linear combination of filter subspace elements, i.e., filter atoms. This simple technique transforms global filter aggregation in FL into aggregating filter atoms and their atom coefficients. The key advantage here involves mathematically generating numerous cross-terms by expanding the product of two weighted sums from filter atom and atom coefficient. These cross-terms effectively emulate many additional latent clients, significantly reducing model variance, which is validated by our theoretical analysis and empirical observation. Furthermore, our method permits different training schemes for filter atoms and atom coefficients for highly adaptive model personalization and communication efficiency. Empirical results on benchmark datasets demonstrate that our filter decomposition technique substantially improves the accuracy of FL methods, confirming its efficacy in addressing data heterogeneity.

Extra Clients at No Extra Cost: Overcome Data Heterogeneity in Federated Learning with Filter Decomposition

TL;DR

This work tackles data heterogeneity in federated learning by introducing filter decomposition of convolutional layers into filter atoms and atom coefficients , enabling many latent local model variants without extra cost. Global aggregation becomes a reconstruction step after averaging and separately, producing a global model that effectively expands the ensemble of participating clients. The authors provide variance-reduction and convergence analyses and demonstrate consistent accuracy gains across FL baselines on CIFAR-10/100 and Tiny-ImageNet, along with improved personalization and communication efficiency via a fast/slow update scheme. The approach is straightforward to integrate with existing FL methods and offers flexible personalization and communication schedules, with notable improvements especially in challenging datasets like Tiny-ImageNet.

Abstract

Data heterogeneity is one of the major challenges in federated learning (FL), which results in substantial client variance and slow convergence. In this study, we propose a novel solution: decomposing a convolutional filter in FL into a linear combination of filter subspace elements, i.e., filter atoms. This simple technique transforms global filter aggregation in FL into aggregating filter atoms and their atom coefficients. The key advantage here involves mathematically generating numerous cross-terms by expanding the product of two weighted sums from filter atom and atom coefficient. These cross-terms effectively emulate many additional latent clients, significantly reducing model variance, which is validated by our theoretical analysis and empirical observation. Furthermore, our method permits different training schemes for filter atoms and atom coefficients for highly adaptive model personalization and communication efficiency. Empirical results on benchmark datasets demonstrate that our filter decomposition technique substantially improves the accuracy of FL methods, confirming its efficacy in addressing data heterogeneity.

Paper Structure

This paper contains 57 sections, 3 theorems, 51 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Proposition 5.1

Consider $\boldsymbol{\theta}_{k, \boldsymbol{\phi}}$ and $\boldsymbol{\theta}_{k_1, k_2, \boldsymbol{\phi}}$ as independent random variables, the parameter obtained by methods without filter decomposition is $\boldsymbol{\theta}_{\boldsymbol{\phi}}=\sum_{k=1}^{m} \frac{n_k}{n} \boldsymbol{\theta}_{

Figures (3)

  • Figure 1: (a) The aggregation of convolutional filters, e.g., FedAvg mcmahan2017communication. (b) We decompose the convolutional filters as filter atoms$\mathbf{D}$ and atom coefficients$\boldsymbol{\alpha}$. During the aggregation phase, we separately average the filter atoms $\mathbf{D}$ and atom coefficients $\boldsymbol{\alpha}$, and subsequently reconstruct the global model by multiplying the aggregated $\boldsymbol{\alpha}$ and aggregated $\mathbf{D}$. In contrast to conventional FL aggregation methods like FedAvg, this mathematical operation naturally leads to additional local model variants significantly reducing the variance of local updates, without introducing extra computation cost or communication overhead.
  • Figure 2: Filter decomposition naturally introduces extra latent clients, offering several advantages to FL: (a) It accelerates the model's convergence speed and increases its accuracy. (b) Employing filter decomposition minimizes variance and maintains this reduction. (c) As the number of clients increases, the test accuracy improves. With filter decomposition, additional latent clients boost this accuracy even further. (d) Our approach also reduces communication costs, as evidenced by a comparison of the parameters communicated to reach the same accuracy.
  • Figure 3: (a) The loss landscape shows that additional clients result in reduced variance and enhanced training stability, resulting in faster convergence. We employ FedAvg as an example to depict the impact of filter decomposition on the training loss. (b) Our filter decomposition method shows lower training loss than FedAvg. (c) As visualized in the loss landscape, our method achieves lower loss faster than FedAvg.

Theorems & Definitions (6)

  • Proposition 5.1
  • Theorem 5.5
  • Remark 5.6
  • Remark 5.7
  • Remark 5.8
  • Theorem 6.1