Table of Contents
Fetching ...

Enhancing Imbalance Learning: A Novel Slack-Factor Fuzzy SVM Approach

M. Tanveer, Anushka Tiwari, Mushir Akhtar, C. T. Lin

TL;DR

The paper tackles the challenge of learning under severe class imbalance for SVM-based classifiers by extending slack-factor fuzzy SVM (SFFSVM) with a novel location parameter $a$ that constrains the DEC hyperplane. This ISFFSVM combines slack-factor-based fuzzy memberships with a tunable boundary constraint to reduce minority misclassification while controlling minority and majority-point influence. Extensive experiments on KEEL datasets show ISFFSVM achieving higher F1, MCC, and AUC-PR than baselines, and a schizophrenia dataset confirms practical gains in medical diagnostics. A sensitivity analysis reveals the performance depends on $a$ and that adaptive or data-driven tuning of $a$ could further enhance robustness. Overall, ISFFSVM provides a principled, effective approach for imbalanced learning with comparable computational cost to SFFSVM.

Abstract

In real-world applications, class-imbalanced datasets pose significant challenges for machine learning algorithms, such as support vector machines (SVMs), particularly in effectively managing imbalance, noise, and outliers. Fuzzy support vector machines (FSVMs) address class imbalance by assigning varying fuzzy memberships to samples; however, their sensitivity to imbalanced datasets can lead to inaccurate assessments. The recently developed slack-factor-based FSVM (SFFSVM) improves traditional FSVMs by using slack factors to adjust fuzzy memberships based on misclassification likelihood, thereby rectifying misclassifications induced by the hyperplane obtained via different error cost (DEC). Building on SFFSVM, we propose an improved slack-factor-based FSVM (ISFFSVM) that introduces a novel location parameter. This novel parameter significantly advances the model by constraining the DEC hyperplane's extension, thereby mitigating the risk of misclassifying minority class samples. It ensures that majority class samples with slack factor scores approaching the location threshold are assigned lower fuzzy memberships, which enhances the model's discrimination capability. Extensive experimentation on a diverse array of real-world KEEL datasets demonstrates that the proposed ISFFSVM consistently achieves higher F1-scores, Matthews correlation coefficients (MCC), and area under the precision-recall curve (AUC-PR) compared to baseline classifiers. Consequently, the introduction of the location parameter, coupled with the slack-factor-based fuzzy membership, enables ISFFSVM to outperform traditional approaches, particularly in scenarios characterized by severe class disparity. The code for the proposed model is available at \url{https://github.com/mtanveer1/ISFFSVM}.

Enhancing Imbalance Learning: A Novel Slack-Factor Fuzzy SVM Approach

TL;DR

The paper tackles the challenge of learning under severe class imbalance for SVM-based classifiers by extending slack-factor fuzzy SVM (SFFSVM) with a novel location parameter that constrains the DEC hyperplane. This ISFFSVM combines slack-factor-based fuzzy memberships with a tunable boundary constraint to reduce minority misclassification while controlling minority and majority-point influence. Extensive experiments on KEEL datasets show ISFFSVM achieving higher F1, MCC, and AUC-PR than baselines, and a schizophrenia dataset confirms practical gains in medical diagnostics. A sensitivity analysis reveals the performance depends on and that adaptive or data-driven tuning of could further enhance robustness. Overall, ISFFSVM provides a principled, effective approach for imbalanced learning with comparable computational cost to SFFSVM.

Abstract

In real-world applications, class-imbalanced datasets pose significant challenges for machine learning algorithms, such as support vector machines (SVMs), particularly in effectively managing imbalance, noise, and outliers. Fuzzy support vector machines (FSVMs) address class imbalance by assigning varying fuzzy memberships to samples; however, their sensitivity to imbalanced datasets can lead to inaccurate assessments. The recently developed slack-factor-based FSVM (SFFSVM) improves traditional FSVMs by using slack factors to adjust fuzzy memberships based on misclassification likelihood, thereby rectifying misclassifications induced by the hyperplane obtained via different error cost (DEC). Building on SFFSVM, we propose an improved slack-factor-based FSVM (ISFFSVM) that introduces a novel location parameter. This novel parameter significantly advances the model by constraining the DEC hyperplane's extension, thereby mitigating the risk of misclassifying minority class samples. It ensures that majority class samples with slack factor scores approaching the location threshold are assigned lower fuzzy memberships, which enhances the model's discrimination capability. Extensive experimentation on a diverse array of real-world KEEL datasets demonstrates that the proposed ISFFSVM consistently achieves higher F1-scores, Matthews correlation coefficients (MCC), and area under the precision-recall curve (AUC-PR) compared to baseline classifiers. Consequently, the introduction of the location parameter, coupled with the slack-factor-based fuzzy membership, enables ISFFSVM to outperform traditional approaches, particularly in scenarios characterized by severe class disparity. The code for the proposed model is available at \url{https://github.com/mtanveer1/ISFFSVM}.

Paper Structure

This paper contains 16 sections, 15 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Decision boundary visualization for SFFSVM and proposed ISFFSVM. (a) Illustration of decision hyperplane and slack factors for SFFSVM model. Points $A$ and $B$ are equidistant from the ideal hyperplane $M^*$, while differing in their importance for hyperplane construction. (b) Improved decision boundary of proposed ISFFSVM model. Introduction of location parameter $a$ ensures correctly classified minority samples are less likely to be misclassified.
  • Figure 2: Illustrates the impact of the location parameter $a$ on membership values of majority class samples. Points $A$, $B$, and $C$ with slack factor values less than $a$ are assigned one membership value, whereas points $E$ and $G$ with slack factor values between $a$ and $2$ are assigned lower membership values.
  • Figure 3: Comparison of decision hyperplanes for SFFSVM and proposed ISFFSVM. (a) Decision surface of SFFSVM on moon dataset containing $1000$ majority class data points (blue dots) and $200$ minority class data points (red dots) (i.e., $IR = 5$). (b) Decision surface of ISFFSVM on the same dataset. The proposed ISFFSVM model demonstrates superior classification of minority class samples due to the introduction of the location parameter $a$, which adjusts membership values more effectively.
  • Figure 4: Sensitivity analysis of the location parameter $a$ for the proposed ISFFSVM model, showing the F1-score values corresponding to different values of $a$ (ranging from 1.1 to 2) across four datasets: Pima, Haberman, Yeast3, and Ecoli1.