Table of Contents
Fetching ...

Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation

Salim Rezvani, Farhad Pourpanah, Chee Peng Lim, Q. M. Jonathan Wu

TL;DR

This paper surveys SVM-based methods for class-imbalanced learning and proposes a three-tier taxonomy: re-sampling, algorithmic, and fusion methods. It provides a comprehensive empirical comparison across 36 benchmark datasets, showing that fusion methods typically outperform others at the cost of higher computation, while algorithmic approaches offer lower preprocessing overhead. The study identifies key gaps for handling noisy, large-scale, and extremely imbalanced data and suggests directions for future research. Altogether, the work offers practical guidance for selecting SVM-based strategies to address class imbalance.

Abstract

This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based models into re-sampling, algorithmic, and fusion methods, and discuss the principles of the representative models in each category. In addition, we conduct a series of empirical evaluations to compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets, ranging from low to high imbalanced ratios. Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best, but with a higher computational load. A discussion on research gaps and future research directions is provided.

Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation

TL;DR

This paper surveys SVM-based methods for class-imbalanced learning and proposes a three-tier taxonomy: re-sampling, algorithmic, and fusion methods. It provides a comprehensive empirical comparison across 36 benchmark datasets, showing that fusion methods typically outperform others at the cost of higher computation, while algorithmic approaches offer lower preprocessing overhead. The study identifies key gaps for handling noisy, large-scale, and extremely imbalanced data and suggests directions for future research. Altogether, the work offers practical guidance for selecting SVM-based strategies to address class imbalance.

Abstract

This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based models into re-sampling, algorithmic, and fusion methods, and discuss the principles of the representative models in each category. In addition, we conduct a series of empirical evaluations to compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets, ranging from low to high imbalanced ratios. Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best, but with a higher computational load. A discussion on research gaps and future research directions is provided.
Paper Structure (23 sections, 43 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 43 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) A hyperplane formed by SVM for an imbalanced data set; (b) an ideal hyperplane that is expected to be formed by SVM-based methods for an imbalanced data set.
  • Figure 2: A hierarchical categorization of SVM-based methods for class imbalanced learning.
  • Figure 3: Examples of (a) an imbalanced data set, (b) under-sampling, (c) over sampling, and (d) combined re-sampling.