Table of Contents
Fetching ...

EvoSampling: A Granular Ball-based Evolutionary Hybrid Sampling with Knowledge Transfer for Imbalanced Learning

Wenbin Pei, Ruohao Dai, Bing Xue, Mengjie Zhang, Qiang Zhang, Yiu-Ming Cheung, Shuyin Xia

TL;DR

The paper tackles imbalanced classification by introducing EvoSampling, a two-stage hybrid sampling framework that first uses multi-task genetic programming to generate diverse, high-quality minority instances and then applies granular ball computing to perform multi-granularity undersampling and noise removal. Knowledge transfer between related GP tasks accelerates evolution and improves sample quality. Experimental results on 20 datasets show EvoSampling consistently improves AUC and G_Mean across AdaBoost, GBDT, RF, and SVM, outperforming several baseline samplers. The work demonstrates the value of combining evolutionary generation of samples with granular, robust undersampling for practical imbalanced learning, while noting the computational demands of multi-task GP. Future work will focus on speeding up the GP component and exploring broader transfer strategies.

Abstract

Class imbalance would lead to biased classifiers that favor the majority class and disadvantage the minority class. Unfortunately, from a practical perspective, the minority class is of importance in many real-life applications. Hybrid sampling methods address this by oversampling the minority class to increase the number of its instances, followed by undersampling to remove low-quality instances. However, most existing sampling methods face difficulties in generating diverse high-quality instances and often fail to remove noise or low-quality instances on a larger scale effectively. This paper therefore proposes an evolutionary multi-granularity hybrid sampling method, called EvoSampling. During the oversampling process, genetic programming (GP) is used with multi-task learning to effectively and efficiently generate diverse high-quality instances. During the undersampling process, we develop a granular ball-based undersampling method that removes noise in a multi-granular fashion, thereby enhancing data quality. Experiments on 20 imbalanced datasets demonstrate that EvoSampling effectively enhances the performance of various classification algorithms by providing better datasets than existing sampling methods. Besides, ablation studies further indicate that allowing knowledge transfer accelerates the GP's evolutionary learning process.

EvoSampling: A Granular Ball-based Evolutionary Hybrid Sampling with Knowledge Transfer for Imbalanced Learning

TL;DR

The paper tackles imbalanced classification by introducing EvoSampling, a two-stage hybrid sampling framework that first uses multi-task genetic programming to generate diverse, high-quality minority instances and then applies granular ball computing to perform multi-granularity undersampling and noise removal. Knowledge transfer between related GP tasks accelerates evolution and improves sample quality. Experimental results on 20 datasets show EvoSampling consistently improves AUC and G_Mean across AdaBoost, GBDT, RF, and SVM, outperforming several baseline samplers. The work demonstrates the value of combining evolutionary generation of samples with granular, robust undersampling for practical imbalanced learning, while noting the computational demands of multi-task GP. Future work will focus on speeding up the GP component and exploring broader transfer strategies.

Abstract

Class imbalance would lead to biased classifiers that favor the majority class and disadvantage the minority class. Unfortunately, from a practical perspective, the minority class is of importance in many real-life applications. Hybrid sampling methods address this by oversampling the minority class to increase the number of its instances, followed by undersampling to remove low-quality instances. However, most existing sampling methods face difficulties in generating diverse high-quality instances and often fail to remove noise or low-quality instances on a larger scale effectively. This paper therefore proposes an evolutionary multi-granularity hybrid sampling method, called EvoSampling. During the oversampling process, genetic programming (GP) is used with multi-task learning to effectively and efficiently generate diverse high-quality instances. During the undersampling process, we develop a granular ball-based undersampling method that removes noise in a multi-granular fashion, thereby enhancing data quality. Experiments on 20 imbalanced datasets demonstrate that EvoSampling effectively enhances the performance of various classification algorithms by providing better datasets than existing sampling methods. Besides, ablation studies further indicate that allowing knowledge transfer accelerates the GP's evolutionary learning process.

Paper Structure

This paper contains 19 sections, 9 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: The two types of oversampling models. (a) SMOTE-based approaches require predefining the structure and obtaining neighborhood information to generate an instance between two points. (b) GP-based approaches do not require a predefined structure and can adaptively generate unconstrained instances.
  • Figure 2: The flowchart of GP.
  • Figure 3: The generation process of GBs. (a) The initial GB. (b) and (c) GBs in the intermediate process. (d) The final GBs.
  • Figure 4: The framework of EvoSampling. The original dataset is oversampled using the multi-task GP to balance data. The GBC is then used to perform multi-granularity undersampling on the oversampled data to remove low-quality instances.
  • Figure 5: A synthetic instance represented by a GP individual.
  • ...and 4 more figures