Table of Contents
Fetching ...

Boosting with Lexicographic Programming: Addressing Class Imbalance without Cost Tuning

Shounak Datta, Sayak Nag, Swagatam Das

TL;DR

It is shown that the assignment of weights to the component classifiers of a boosted ensemble can be thought of as a game of Tug of War between the classes in the margin space, and demonstrated how this insight can be used to attain a good compromise between the rare and abundant classes without having to resort to cost set tuning.

Abstract

A large amount of research effort has been dedicated to adapting boosting for imbalanced classification. However, boosting methods are yet to be satisfactorily immune to class imbalance, especially for multi-class problems. This is because most of the existing solutions for handling class imbalance rely on expensive cost set tuning for determining the proper level of compensation. We show that the assignment of weights to the component classifiers of a boosted ensemble can be thought of as a game of Tug of War between the classes in the margin space. We then demonstrate how this insight can be used to attain a good compromise between the rare and abundant classes without having to resort to cost set tuning, which has long been the norm for imbalanced classification. The solution is based on a lexicographic linear programming framework which requires two stages. Initially, class-specific component weight combinations are found so as to minimize a hinge loss individually for each of the classes. Subsequently, the final component weights are assigned so that the maximum deviation from the class-specific minimum loss values (obtained in the previous stage) is minimized. Hence, the proposal is not only restricted to two-class situations, but is also readily applicable to multi-class problems. Additionally,we also derive the dual formulation corresponding to the proposed framework. Experiments conducted on artificial and real-world imbalanced datasets as well as on challenging applications such as hyperspectral image classification and ImageNet classification establish the efficacy of the proposal.

Boosting with Lexicographic Programming: Addressing Class Imbalance without Cost Tuning

TL;DR

It is shown that the assignment of weights to the component classifiers of a boosted ensemble can be thought of as a game of Tug of War between the classes in the margin space, and demonstrated how this insight can be used to attain a good compromise between the rare and abundant classes without having to resort to cost set tuning.

Abstract

A large amount of research effort has been dedicated to adapting boosting for imbalanced classification. However, boosting methods are yet to be satisfactorily immune to class imbalance, especially for multi-class problems. This is because most of the existing solutions for handling class imbalance rely on expensive cost set tuning for determining the proper level of compensation. We show that the assignment of weights to the component classifiers of a boosted ensemble can be thought of as a game of Tug of War between the classes in the margin space. We then demonstrate how this insight can be used to attain a good compromise between the rare and abundant classes without having to resort to cost set tuning, which has long been the norm for imbalanced classification. The solution is based on a lexicographic linear programming framework which requires two stages. Initially, class-specific component weight combinations are found so as to minimize a hinge loss individually for each of the classes. Subsequently, the final component weights are assigned so that the maximum deviation from the class-specific minimum loss values (obtained in the previous stage) is minimized. Hence, the proposal is not only restricted to two-class situations, but is also readily applicable to multi-class problems. Additionally,we also derive the dual formulation corresponding to the proposed framework. Experiments conducted on artificial and real-world imbalanced datasets as well as on challenging applications such as hyperspectral image classification and ImageNet classification establish the efficacy of the proposal.

Paper Structure

This paper contains 23 sections, 22 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Toy class imbalanced dataset along with the classifiers trainer in three rounds of boosting.
  • Figure 2: Tug of War between the classes: Solutions A and G respectively correspond to ideal performance on the minority and majority classes, and may not be attainable in practice on most datasets. B and F correspond to the respective best attainable performances with only the outliers having non-ideal signed margins. C, D, and E are trade-off solutions with non-outlier points from both classes having non-ideal signed margins. Solution D yields the optimal trade-off with similar fractions of non-outlier points having zero margin (i.e. prone to being misclassified) from both classes.
  • Figure 3: Average G-Mean values for $k$NN on artificial datasets: LexiBoost and Dual-LexiBoost are compared with the two best contenders.
  • Figure 4: Comparison of the contending methods showing better performance of the proposal on minority classes. Misclassified points are shown in black and correctly-classified points are shown in green.
  • Figure 5: ImageNet subset hierarchy.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4