Table of Contents
Fetching ...

Solving the Best Subset Selection Problem via Suboptimal Algorithms

Vikram Singh, Min Sun

TL;DR

The paper tackles the NP-hard best subset selection problem in linear regression under a cardinality constraint $\|\beta\|_{0} \le k$ by evaluating suboptimal approaches. It surveys four established suboptimal methods—Forward Selection, Sequential Forward Floating Selection, Discrete First Order, and Genetic Algorithm—and introduces a new sequential feature swapping (SFS) algorithm that iteratively replaces weaker predictors with stronger outsiders. Through extensive synthetic and real-data experiments, the study shows that SFS variants, particularly SFS1 and SFS2, deliver strong solution quality with favorable CPU-time trade-offs, though performance depends on data structure (constant vs exponential correlation) and problem regime (OD vs UD). The results provide practical guidance for scalable high-dimensional BSS and are accompanied by reproducible code. Overall, SFS-based approaches emerge as robust, efficient alternatives for suboptimal yet effective subset selection in large-scale settings.

Abstract

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.

Solving the Best Subset Selection Problem via Suboptimal Algorithms

TL;DR

The paper tackles the NP-hard best subset selection problem in linear regression under a cardinality constraint by evaluating suboptimal approaches. It surveys four established suboptimal methods—Forward Selection, Sequential Forward Floating Selection, Discrete First Order, and Genetic Algorithm—and introduces a new sequential feature swapping (SFS) algorithm that iteratively replaces weaker predictors with stronger outsiders. Through extensive synthetic and real-data experiments, the study shows that SFS variants, particularly SFS1 and SFS2, deliver strong solution quality with favorable CPU-time trade-offs, though performance depends on data structure (constant vs exponential correlation) and problem regime (OD vs UD). The results provide practical guidance for scalable high-dimensional BSS and are accompanied by reproducible code. Overall, SFS-based approaches emerge as robust, efficient alternatives for suboptimal yet effective subset selection in large-scale settings.

Abstract

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.

Paper Structure

This paper contains 27 sections, 3 theorems, 18 equations, 9 figures, 6 tables, 5 algorithms.

Key Result

Proposition 2.1

If $\Hat{\beta}$ is an optimal solution to the following problem, then it can be computed as follows: $\Hat{\beta}$ retains the $k$ largest (in absolute value) elements of $c \in \mathbb{R}^{p}$ and sets the rest to zero, i.e. if $|c_{(1)}| \geq |c_{(2)}| \geq . . . \geq |c_{(p)}|$, denote the ordered values of the absolute value of the vector $c$, then where $\Hat{\beta}_{i}$ is the $ith$ coord

Figures (9)

  • Figure 1: Box plots of Relative Gap $\%$ for examples 1 (left), 2 (middle), and 3 (right) in OD case with constant correlation in small, medium, and large dimension regimes with four SNR values and the two $k$ values where (1) SFS1; (2) SFS2; (3) FS; (4) SFFS; (5) GA; (6) DFOn
  • Figure 2: Box plots of Relative Gap $\%$ for example 1 (left), 2 (middle), and 3 (right) in UD case with constant correlation in small, medium, and large dimension regimes with four SNR values and the two $k$ values where (1) SFS1; (2) SFS2; (3) FS; (4) SFFS; (5) GA; (6) DFOn
  • Figure 3: Performance profiles of CPU time for examples 1, 2, and 3 combined in OD case with constant correlation in small, medium, and large dimension regimes with four SNR values and the two $k$ values
  • Figure 4: Performance profiles of CPU time for examples 1, 2, and 3 combined in UD case with constant correlation in small, medium, and large dimension regimes with four SNR values and the two $k$ values
  • Figure 5: Box plots of Relative Gap $\%$ for examples 1 (left), 2 (middle), and 3 (right) in OD case with exponential correlation in small, medium, and large dimension regimes with four SNR values and the two $k$ values where (1) SFS1; (2) SFS2; (3) FS; (4) SFFS; (5) GA; (6) DFOn
  • ...and 4 more figures

Theorems & Definitions (7)

  • Proposition 2.1: Proposition 3, bertsimasEtal:2015
  • Proposition 2.2: Proposition 4, bertsimasEtal:2015
  • Proposition 3.1
  • proof
  • Example 1
  • Example 2
  • Example 3