Table of Contents
Fetching ...

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi

TL;DR

This work proposes a novel FSSL framework with dual regulators, FedDure, and demonstrates that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.

Abstract

Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure. FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11 on CIFAR-10 and CINIC-10 datasets.

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

TL;DR

This work proposes a novel FSSL framework with dual regulators, FedDure, and demonstrates that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.

Abstract

Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure. FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11 on CIFAR-10 and CINIC-10 datasets.
Paper Structure (13 sections, 2 theorems, 12 equations, 5 figures, 2 tables)

This paper contains 13 sections, 2 theorems, 12 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Suppose that supervised loss function $\mathcal{L}_{ce}(\textbf{y}, f_d(\textbf{x}; \pmb{\phi}^{t+1}(\pmb{\theta}_l^{t}))$ is $L$-Lipschitz and has $\rho$-bounded gradients. The $\mathcal{L}_{ce}\left(\hat{\textbf{y}}, f_d\left(\mathcal{T}_s(\textbf{u}); \pmb{\phi}^{t}\right)\right)$ has $\rho$-boun

Figures (5)

  • Figure 1: Existing federated semi-supervised learning (FSSL) methods cannot address heterogeneity between labeled and unlabeled data within a client (internal imbalance) and heterogeneous data across clients (external imbalance); some of them are even worse than supervised FL using 10% data (green line, which is FedAvg* in Table \ref{['comparasion']}). Our proposed FedDure significantly outperforms existing methods. These experiments are based on three runs on CIFAR-10 and we provide more description in Section Experiments.
  • Figure 2: Illustration of Federated Semi-Supervised Learning Framework with Dual Regulator (FedDure). FedDure contains a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg) to adaptively guide local updates in each client: C-reg dynamically regulates the importance of local training on the unlabeled data by reflecting the overall learning effect on labeled data; F-reg regulates the performance contribution of each unlabeled sample.
  • Figure 3: Comparison of data distribution between FedMatch jeong2020federated and our (DIR, DIR) setting: (a) and (b) are labeled and unlabeled data distribution used in FedMatch, respectively; our data distribution in (c) and (d) present external imbalance across clients and internal imbalance between labeled and unlabeled data inside a client.
  • Figure 4: Impact of different Dirichlet coefficients under (IID, DIR) and (DIR, DIR) settings on CIFAR10 dataset.
  • Figure 5: Analysis of the impacts of the number of labeled data and selected clients. (a) and (b) illustrate that FedDure consistently outperforms FedMatch and Baseline (FedAvg-Fixmatch) using different percentages of labeled data. (c) and (d) show that FedDure scales with increasing numbers of selected clients on CIFAR-10 and Fashion-MNIST datasets.

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2