Table of Contents
Fetching ...

Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach

Immanuel Bomze, Federico D'Onofrio, Laura Palagi, Bo Peng

TL;DR

The paper tackles embedded feature selection for linear SVMs under a hard sparsity budget by formulating FS-SVM as two MIQP models (BigMP and CoP) and developing scalable SDP relaxations, including decomposed variants that exploit sparsity. It introduces two practical upper-bound strategies (Local Search and Kernel Search) and an exact algorithm that solves a sequence of MISOCPs, enabling global solutions on sizable datasets. Empirical results show the proposed methods achieve strong bounds, competitive or superior predictive accuracy, and significant feature-budget control compared with baseline approaches like $\ell_1$-FS-SVM and elastic-net SVM. The approach enhances interpretability by guaranteeing a fixed number of selected features while maintaining robust classification performance, with potential extensions to non-linear SVMs and related interpretability-focused models.

Abstract

In this paper, we study the embedded feature selection problem in linear Support Vector Machines (SVMs), in which a cardinality constraint is employed, leading to an interpretable classification model. The problem is NP-hard due to the presence of the cardinality constraint, even though the original linear SVM amounts to a problem solvable in polynomial time. To handle the hard problem, we first introduce two mixed-integer formulations for which novel semidefinite relaxations are proposed. Exploiting the sparsity pattern of the relaxations, we decompose the problems and obtain equivalent relaxations in a much smaller cone, making the conic approaches scalable. To make the best usage of the decomposed relaxations, we propose heuristics using the information of its optimal solution. Moreover, an exact procedure is proposed by solving a sequence of mixed-integer decomposed semidefinite optimization problems. Numerical results on classical benchmarking datasets are reported, showing the efficiency and effectiveness of our approach.

Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach

TL;DR

The paper tackles embedded feature selection for linear SVMs under a hard sparsity budget by formulating FS-SVM as two MIQP models (BigMP and CoP) and developing scalable SDP relaxations, including decomposed variants that exploit sparsity. It introduces two practical upper-bound strategies (Local Search and Kernel Search) and an exact algorithm that solves a sequence of MISOCPs, enabling global solutions on sizable datasets. Empirical results show the proposed methods achieve strong bounds, competitive or superior predictive accuracy, and significant feature-budget control compared with baseline approaches like -FS-SVM and elastic-net SVM. The approach enhances interpretability by guaranteeing a fixed number of selected features while maintaining robust classification performance, with potential extensions to non-linear SVMs and related interpretability-focused models.

Abstract

In this paper, we study the embedded feature selection problem in linear Support Vector Machines (SVMs), in which a cardinality constraint is employed, leading to an interpretable classification model. The problem is NP-hard due to the presence of the cardinality constraint, even though the original linear SVM amounts to a problem solvable in polynomial time. To handle the hard problem, we first introduce two mixed-integer formulations for which novel semidefinite relaxations are proposed. Exploiting the sparsity pattern of the relaxations, we decompose the problems and obtain equivalent relaxations in a much smaller cone, making the conic approaches scalable. To make the best usage of the decomposed relaxations, we propose heuristics using the information of its optimal solution. Moreover, an exact procedure is proposed by solving a sequence of mixed-integer decomposed semidefinite optimization problems. Numerical results on classical benchmarking datasets are reported, showing the efficiency and effectiveness of our approach.
Paper Structure (21 sections, 9 theorems, 36 equations, 2 figures, 9 tables, 4 algorithms)

This paper contains 21 sections, 9 theorems, 36 equations, 2 figures, 9 tables, 4 algorithms.

Key Result

Theorem 3.1

If the parameter $M$ in the problem prob:box-l2-FS-SVM1 satisfies where $({\mathsf w}^*,b^*, {\bm{\xi}}^*)$ is an prob:l2-SVM-optimal solution, then prob:box-l2-FS-SVM1 is equivalent to prob:l2-SVM.

Figures (2)

  • Figure 1: Comparison of solving BigMP and CoP with Gurobi for the small datasets: mean computational times (in seconds)
  • Figure 2: Comparison of the mean MipGap values of the solution found by Gurobi solving CoP after 1 hour, and the Relaxed Gap of the Heuristic Kernel Search solution using the solution of DSCoP as a lower bound

Theorems & Definitions (22)

  • Remark 2.1
  • Remark 2.2
  • Theorem 3.1
  • proof
  • Remark 3.2
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 3.5
  • ...and 12 more