Table of Contents
Fetching ...

How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance

Hongkang Li, Shuai Zhang, Yihua Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen

TL;DR

It is shown that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy.

Abstract

Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.

How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance

TL;DR

It is shown that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy.

Abstract

Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.
Paper Structure (45 sections, 19 theorems, 292 equations, 14 figures, 3 tables, 1 algorithm)

This paper contains 45 sections, 19 theorems, 292 equations, 14 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

There exist $\epsilon_0\in(0,\frac{1}{4})$ and positive value functions $\mathcal{B}(\Psi)$ (sample complexity parameter), $q(\Psi)$ (convergence rate parameter), and $\mathcal{E}_w(\Psi)$, $\mathcal{E}(\Psi)$, $\mathcal{E}_l(\Psi)$ (generalization parameters) such that as long as the sample size $n we have that with probability at least $1-d^{-10}$, the iterates $\{{\boldsymbol W}_t\}_{t=1}^T$ re

Figures (14)

  • Figure 1: Group imbalance experiment. (a) Binary classification on CelebA dataset using Gaussian augmentation to control the minority group co-variance. (b) Test accuracy against the augmented noise level.
  • Figure 2: The sample complexity when the feature dimension changes
  • Figure 3: The sample complexity (a) when one mean changes, (b) when one co-variance changes.
  • Figure 4: (a) The convergence rate with different ${\boldsymbol \mu}_1$. (b) The convergence rate with different ${\boldsymbol\Sigma}$. (c) Convergence rate when the number of neurons $K$ changes.
  • Figure 5: (a) Convergence rate when the number of neurons $K$ changes. (b) The relative error of the learned model when $n$ changes.
  • ...and 9 more figures

Theorems & Definitions (24)

  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 4
  • Lemma 5
  • ...and 14 more