Cost-sensitive Feature Selection for Support Vector Machines

Sandra Benítez-Peña; Rafael Blanquero; Emilio Carrizosa; Pepa Ramírez-Cobo

Cost-sensitive Feature Selection for Support Vector Machines

Sandra Benítez-Peña, Rafael Blanquero, Emilio Carrizosa, Pepa Ramírez-Cobo

TL;DR

This work tackles feature selection in binary classification under asymmetric misclassification costs by embedding feature selection into Support Vector Machines. The authors formulate an optimization pipeline that first solves a mixed-integer linear program (P1) to minimize the total feature cost, while enforcing empirical bounds on performance measures via $TPR$ and $TNR$; the selected features are then used to train the SVM, either with a linear kernel (P2) or with a kernelized approach (P3) where the kernel is adapted to the chosen feature subset $z$. Empirical results on several UCI datasets demonstrate substantial feature reduction with controlled performance losses, and the Hoeffding-based thresholding scheme often improves accuracy at the cost of sparsity; comparisons with related methods indicate competitive results with explicit cost-sensitivity. The approach offers a practical, scalable path to sparse, cost-aware SVMs and suggests extensions to regression, multiclass fusion, and more sophisticated kernel adaptations. Overall, the method delivers sparse, cost-aware SVMs that maintain key performance guarantees while reducing measurement costs and model complexity.

Abstract

Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf Feature Selection procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based Feature Selection procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines, accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for Support Vector Machines with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed Feature Selection procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved.

Cost-sensitive Feature Selection for Support Vector Machines

TL;DR

and

; the selected features are then used to train the SVM, either with a linear kernel (P2) or with a kernelized approach (P3) where the kernel is adapted to the chosen feature subset

. Empirical results on several UCI datasets demonstrate substantial feature reduction with controlled performance losses, and the Hoeffding-based thresholding scheme often improves accuracy at the cost of sparsity; comparisons with related methods indicate competitive results with explicit cost-sensitivity. The approach offers a practical, scalable path to sparse, cost-aware SVMs and suggests extensions to regression, multiclass fusion, and more sophisticated kernel adaptations. Overall, the method delivers sparse, cost-aware SVMs that maintain key performance guarantees while reducing measurement costs and model complexity.

Abstract

Paper Structure (11 sections, 18 equations, 10 tables, 1 algorithm)

This paper contains 11 sections, 18 equations, 10 tables, 1 algorithm.

Introduction
Cost-sensitive Feature Selection
The cost-sensitive FS procedure
Cost-sensitive sparse SVMs: linear vs arbitrary kernels
Experiment Description
Numerical Results
Data description
Results under the cost-sensitive sparse SVM with linear kernel
Results under the cost-sensitive sparse SVM with radial kernel
Comparison with other methodologies
Concluding remarks

Cost-sensitive Feature Selection for Support Vector Machines

TL;DR

Abstract

Cost-sensitive Feature Selection for Support Vector Machines

Authors

TL;DR

Abstract

Table of Contents