Kernel-Based Enhanced Oversampling Method for Imbalanced Classification
Wenjie Li, Sibo Zhu, Zhijian Li, Hanlin Wang
TL;DR
This work targets imbalanced classification by enhancing the Synthetic Minority Over-sampling Technique (SMOTE) with kernel-weighted convex combinations, yielding synthetic minority samples that better reflect the true distribution. The proposed method, KWSMOTE, uses Gaussian kernel weights to bias sampling toward nearby minority points and their neighbors, ensuring generated samples lie within the convex hull of the minority region and near class boundaries when appropriate, formalized by $x_{ik} = \sum_{j=0}^{k} \nu_{ij} x_i^{(j)}$ with $\nu_{ij} = w_{ij}/D_{ik}$ and $D_{ik} = \sum_{j=0}^{k} w_{ij}$, and $w_{ij} = K(x_i, x_i^{(j)})$, $K(x, x') = \exp(-\|x - x'\|^2/(2\sigma^2))$. Evaluations on four real-world datasets (Blood Transfusion, Haberman, Breast Cancer Wisconsin (Diagnostic), Diabetes) across Random Forest and SVM classifiers show that KWSMOTE consistently outperforms raw data, SMOTE, SNOCC, BorderlineSMOTE, and SVMSMOTE in F1-score, G-mean, and AUC, with notable improvements in minority-class recall and boundary preservation. The results demonstrate the method’s robustness and practical impact for imbalanced classification, with potential for integration into deep learning pipelines and high-dimensional tasks in the future.
Abstract
This paper introduces a novel oversampling technique designed to improve classification performance on imbalanced datasets. The proposed method enhances the traditional SMOTE algorithm by incorporating convex combination and kernel-based weighting to generate synthetic samples that better represent the minority class. Through experiments on multiple real-world datasets, we demonstrate that the new technique outperforms existing methods in terms of F1-score, G-mean, and AUC, providing a robust solution for handling imbalanced datasets in classification tasks.
