Solving the Best Subset Selection Problem via Suboptimal Algorithms
Vikram Singh, Min Sun
TL;DR
The paper tackles the NP-hard best subset selection problem in linear regression under a cardinality constraint $\|\beta\|_{0} \le k$ by evaluating suboptimal approaches. It surveys four established suboptimal methods—Forward Selection, Sequential Forward Floating Selection, Discrete First Order, and Genetic Algorithm—and introduces a new sequential feature swapping (SFS) algorithm that iteratively replaces weaker predictors with stronger outsiders. Through extensive synthetic and real-data experiments, the study shows that SFS variants, particularly SFS1 and SFS2, deliver strong solution quality with favorable CPU-time trade-offs, though performance depends on data structure (constant vs exponential correlation) and problem regime (OD vs UD). The results provide practical guidance for scalable high-dimensional BSS and are accompanied by reproducible code. Overall, SFS-based approaches emerge as robust, efficient alternatives for suboptimal yet effective subset selection in large-scale settings.
Abstract
Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.
