Ultra-imbalanced classification guided by statistical information

Yin Jin; Ningtao Wang; Ruofan Wu; Pengfei Shi; Xing Fu; Weiqiang Wang

Ultra-imbalanced classification guided by statistical information

Yin Jin, Ningtao Wang, Ruofan Wu, Pengfei Shi, Xing Fu, Weiqiang Wang

TL;DR

A novel learning objective termed Tunable Boosting Loss is developed which is provably resistant against data imbalance under UIC, as well as being empirically efficient verified by extensive experimental studies on both public and industrial datasets.

Abstract

Imbalanced data are frequently encountered in real-world classification tasks. Previous works on imbalanced learning mostly focused on learning with a minority class of few samples. However, the notion of imbalance also applies to cases where the minority class contains abundant samples, which is usually the case for industrial applications like fraud detection in the area of financial risk management. In this paper, we take a population-level approach to imbalanced learning by proposing a new formulation called \emph{ultra-imbalanced classification} (UIC). Under UIC, loss functions behave differently even if infinite amount of training samples are available. To understand the intrinsic difficulty of UIC problems, we borrow ideas from information theory and establish a framework to compare different loss functions through the lens of statistical information. A novel learning objective termed Tunable Boosting Loss is developed which is provably resistant against data imbalance under UIC, as well as being empirically efficient verified by extensive experimental studies on both public and industrial datasets.

Ultra-imbalanced classification guided by statistical information

TL;DR

Abstract

Paper Structure (14 sections, 3 theorems, 60 equations, 6 figures, 8 tables)

This paper contains 14 sections, 3 theorems, 60 equations, 6 figures, 8 tables.

Introduction
Motivations and contributions
Related literatures
Ultra-imbalance and statistical information
The UIC formulation
A motivating case: analysis of Gaussian mixture
Numerical results from normal mixture models
A framework for comparing loss functions
Robustness improvements to the alpha loss
Experiments
Experiment setups
Results
Preliminary results on multi-class classification
Conclusions and future works

Key Result

Theorem 2

The following results characterizes the population risk minimizer regarding several losses under the UIC setup: (i) square loss (ii) erf loss: (iii) alpha loss: (iv) Optimality for a special case If $k_+=k_-=1$, namely, the class conditional density are gaussian, the linear classifier learned by alpha loss with $\alpha = \frac{1}{2}$ has the optimal AUC among all linear classifiers.

Figures (6)

Figure 1: Linear classifier learned by different losses on two normal clusters. The ratio of minority samples (red) to majority samples (cyan) is 1:1000. Dashed line: linear classifier learned by cross entropy; Solid line: linear classifier learned by exponential loss.
Figure 2: X axis represents $\alpha$ used in learning linear classifier, Y axis represents the AUC value of the learned classifier in this case. The figure fits a smooth curve from results of different choice of $\alpha$ obtained by stochastic gradient descent.
Figure 3:
Figure 4:
Figure 6: The line chart to reveal the effect of parameter $C$ with the confidence interval drawn. X axis represents the denoising parameter in tunable boosting loss, The solid line and the Y axis on the left represent the result of average accuracy. The dashed line and the secondary Y axis on the right represent the result of AUC. The shaded area reflects the standard error of result. See the text for interpretation.
...and 1 more figures

Theorems & Definitions (10)

Definition 1
Theorem 2
Definition 3
Definition 4
Definition 5
Definition 6
Definition 7: $f$-funtion under UIC
Theorem 8
Theorem 9
Remark 1

Ultra-imbalanced classification guided by statistical information

TL;DR

Abstract

Ultra-imbalanced classification guided by statistical information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)