Table of Contents
Fetching ...

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

Kyle Sang, Tahseen Rabbani, Furong Huang

TL;DR

It is demonstrated that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.

Abstract

Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained StyleGAN. These noisy images mimic the power spectra patterns present in natural scenes which, together with mixup images, help homogenize label distribution among clients. We demonstrate that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

TL;DR

It is demonstrated that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.

Abstract

Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained StyleGAN. These noisy images mimic the power spectra patterns present in natural scenes which, together with mixup images, help homogenize label distribution among clients. We demonstrate that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.
Paper Structure (10 sections, 2 figures, 3 tables, 2 algorithms)

This paper contains 10 sections, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: Pseudo-images. We augment local training data with two varieties of pseudo-images: mixup and natural noise. Figure \ref{['fig:cifarmix']} depicts 2-way mixup of CIFAR-10 using Algorithm \ref{['alg:imagemixup']}. Figure \ref{['fig:natural-noise']} depicts unlabeled, StyleGAN-Oriented natural images.
  • Figure 2: Extreme Label-Imbalance. A visualization of $C=1$ label skew. Hybrid strategies (mixup + natural noise) generally outperform supplements consisting solely of mixups. Natural images boost performance at lower supplements.