Table of Contents
Fetching ...

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

Zhenheng Tang, Yonggang Zhang, Peijie Dong, Yiu-ming Cheung, Amelie Chi Zhou, Bo Han, Xiaowen Chu

TL;DR

This work provides a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity.

Abstract

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

TL;DR

This work provides a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity.

Abstract

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

Paper Structure

This paper contains 40 sections, 1 theorem, 6 equations, 9 figures, 14 tables, 1 algorithm.

Key Result

Lemma 3.1

Given spurious feature $R_{}^{\text{spu}}$ for the label $Y$, and probabilistic graph model $(R_{}^{\text{spu}}, Y) \to X \to H(X)$, then, There is a nuisance $R_{}^{\text{spu}}$ such that equality holds up to a residual $\epsilon$: where $\epsilon \triangleq I(H(X);Y|R_{}^{\text{spu}}) - I(X;Y)$. The sufficient statistic $H(X)$ (satisfying Eq. eq:IB-X) is invariant to $R_{}^{\text{spu}}$ if and

Figures (9)

  • Figure 1: Structure Equation Model pearl2009causalityResnet of FL.
  • Figure 2: Estimated MI and separability of trained models with non-IID datasets.
  • Figure 3: Estimated MI and separability of trained models with non-IID backdoored datasets.
  • Figure 4: (a) Initially, all layers are isolated training. Note that the layer here does not only mean one or Conv layer, but generally refers to a neural network block that can consist of multiple layers. (b) Then, all first blocks (L1) of different clients are communicated, shared and frozen among clients. Then, the adaptors are added behind the fused block, to fuse features outputted from the concatenated local blocks. (c) Train the third blocks (L3) follow the similar process in (b). (d) inference process of FuseFL. The larger squares represent the original training block in local models. The smaller squares are adaptors that fuse features from previous modules together, which are $1\times1$ Conv kernels or simple average operations with little or no memory costs. Note that (a) also represents local training in ensemble FL, where different clients train models on local datasets.
  • Figure 5: Each row is a class of original (upper) and backdoored (lower) images of CIFAR-10. The shapes are added on images according to label indexes.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma 3.1: Invariance and minimality achille2018emergence
  • Remark 3.1