Table of Contents
Fetching ...

Robust Support Vector Machines for Imbalanced and Noisy Data via Benders Decomposition

Seyed Mojtaba Mohasel, Hamidreza Koosha

TL;DR

This work tackles SVM performance degradation under class imbalance and noise by formulating a mixed-integer SVM that counts margin violations with binary variables and prioritizes samples near the decision boundary. Solved via Benders decomposition, the approach alternates between a subproblem (hard-margin SVM with kernelization) and a master problem that iteratively selects high-priority samples to refine the boundary, effectively biasing the boundary toward the minority class and improving robustness to outliers. Across OpenML binary datasets, the method yields higher minority-class F1 and overall accuracy compared to Soft Margin SVM, Weighted SVM, and NuSVC, while using fewer support vectors and achieving competitive prediction times, albeit with higher training time. The open-source implementation enables practical deployment in imbalanced or noisy classification tasks, with future work extending to regression, multiclass, and kernel analyses.

Abstract

This study introduces a novel formulation to enhance Support Vector Machines (SVMs) in handling class imbalance and noise. Unlike the conventional Soft Margin SVM, which penalizes the magnitude of constraint violations, the proposed model quantifies the number of violations and aims to minimize their frequency. To achieve this, a binary variable is incorporated into the objective function of the primal SVM formulation, replacing the traditional slack variable. Furthermore, each misclassified sample is assigned a priority and an associated constraint. The resulting formulation is a mixed-integer programming model, efficiently solved using Benders decomposition. The proposed model's performance was benchmarked against existing models, including Soft Margin SVM, weighted SVM, and NuSVC. Two primary hypotheses were examined: 1) The proposed model improves the F1-score for the minority class in imbalanced classification tasks. 2) The proposed model enhances classification accuracy in noisy datasets. These hypotheses were evaluated using a Wilcoxon test across multiple publicly available datasets from the OpenML repository. The results supported both hypotheses (\( p < 0.05 \)). In addition, the proposed model exhibited several interesting properties, such as improved robustness to noise, a decision boundary shift favoring the minority class, a reduced number of support vectors, and decreased prediction time. The open-source Python implementation of the proposed SVM model is available.

Robust Support Vector Machines for Imbalanced and Noisy Data via Benders Decomposition

TL;DR

This work tackles SVM performance degradation under class imbalance and noise by formulating a mixed-integer SVM that counts margin violations with binary variables and prioritizes samples near the decision boundary. Solved via Benders decomposition, the approach alternates between a subproblem (hard-margin SVM with kernelization) and a master problem that iteratively selects high-priority samples to refine the boundary, effectively biasing the boundary toward the minority class and improving robustness to outliers. Across OpenML binary datasets, the method yields higher minority-class F1 and overall accuracy compared to Soft Margin SVM, Weighted SVM, and NuSVC, while using fewer support vectors and achieving competitive prediction times, albeit with higher training time. The open-source implementation enables practical deployment in imbalanced or noisy classification tasks, with future work extending to regression, multiclass, and kernel analyses.

Abstract

This study introduces a novel formulation to enhance Support Vector Machines (SVMs) in handling class imbalance and noise. Unlike the conventional Soft Margin SVM, which penalizes the magnitude of constraint violations, the proposed model quantifies the number of violations and aims to minimize their frequency. To achieve this, a binary variable is incorporated into the objective function of the primal SVM formulation, replacing the traditional slack variable. Furthermore, each misclassified sample is assigned a priority and an associated constraint. The resulting formulation is a mixed-integer programming model, efficiently solved using Benders decomposition. The proposed model's performance was benchmarked against existing models, including Soft Margin SVM, weighted SVM, and NuSVC. Two primary hypotheses were examined: 1) The proposed model improves the F1-score for the minority class in imbalanced classification tasks. 2) The proposed model enhances classification accuracy in noisy datasets. These hypotheses were evaluated using a Wilcoxon test across multiple publicly available datasets from the OpenML repository. The results supported both hypotheses (). In addition, the proposed model exhibited several interesting properties, such as improved robustness to noise, a decision boundary shift favoring the minority class, a reduced number of support vectors, and decreased prediction time. The open-source Python implementation of the proposed SVM model is available.

Paper Structure

This paper contains 13 sections, 29 equations, 10 figures, 5 tables, 4 algorithms.

Figures (10)

  • Figure 1: Schematic of the solution procedure for the proposed model using Benders decomposition, which splits the original problem into a subproblem and a master problem.
  • Figure 2: Flowchart of the proposed model
  • Figure 3: Comparison of the proposed model's decision boundary evolution and the Soft Margin SVM approach.
  • Figure 4: Comparison of data distribution and different model training approaches.
  • Figure 5: Comparing F1-score (left) and accuracy (right) of Soft Margin SVM, NuSVC, and the proposed model across datasets.
  • ...and 5 more figures