Table of Contents
Fetching ...

abess: A Fast Best Subset Selection Library in Python and R

Jin Zhu, Xueqin Wang, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu

TL;DR

abess introduces a fast, unified library for best-subset selection across linear regression, classification, and PCA, built on a splicing technique that yields certifiably optimal solutions in polynomial time with high probability under the linear model ($BSS$ is $NP$-hard). The architecture modularizes data, algorithms, and evaluation, offering seven BSS tasks, group-structure support, and $\ell_2$-regularization variants, with a C++ core and Python/R interfaces that integrate with scikit-learn and CRAN. Empirical results demonstrate strong performance: competitive accuracy with dramatically reduced runtimes compared to existing solvers, including notable speedups over scikit-learn's $\ell_1$-based methods and efficient SPCA performance vs elasticnet. The library is open-source under GPL-3, cross-platform, and designed for ease of use and extension, with documentation and CI in place.

Abstract

We introduce a new library named abess that implements a unified framework of best-subset selection for solving diverse machine learning problems, e.g., linear regression, classification, and principal component analysis. Particularly, the abess certifiably gets the optimal solution within polynomial times with high probability under the linear model. Our efficient implementation allows abess to attain the solution of best-subset selection problems as fast as or even 20x faster than existing competing variable (model) selection toolboxes. Furthermore, it supports common variants like best group subset selection and $\ell_2$ regularized best-subset selection. The core of the library is programmed in C++. For ease of use, a Python library is designed for conveniently integrating with scikit-learn, and it can be installed from the Python library Index. In addition, a user-friendly R library is available at the Comprehensive R Archive Network. The source code is available at: https://github.com/abess-team/abess.

abess: A Fast Best Subset Selection Library in Python and R

TL;DR

abess introduces a fast, unified library for best-subset selection across linear regression, classification, and PCA, built on a splicing technique that yields certifiably optimal solutions in polynomial time with high probability under the linear model ( is -hard). The architecture modularizes data, algorithms, and evaluation, offering seven BSS tasks, group-structure support, and -regularization variants, with a C++ core and Python/R interfaces that integrate with scikit-learn and CRAN. Empirical results demonstrate strong performance: competitive accuracy with dramatically reduced runtimes compared to existing solvers, including notable speedups over scikit-learn's -based methods and efficient SPCA performance vs elasticnet. The library is open-source under GPL-3, cross-platform, and designed for ease of use and extension, with documentation and CI in place.

Abstract

We introduce a new library named abess that implements a unified framework of best-subset selection for solving diverse machine learning problems, e.g., linear regression, classification, and principal component analysis. Particularly, the abess certifiably gets the optimal solution within polynomial times with high probability under the linear model. Our efficient implementation allows abess to attain the solution of best-subset selection problems as fast as or even 20x faster than existing competing variable (model) selection toolboxes. Furthermore, it supports common variants like best group subset selection and regularized best-subset selection. The core of the library is programmed in C++. For ease of use, a Python library is designed for conveniently integrating with scikit-learn, and it can be installed from the Python library Index. In addition, a user-friendly R library is available at the Comprehensive R Archive Network. The source code is available at: https://github.com/abess-team/abess.

Paper Structure

This paper contains 5 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: abess software architecture.
  • Figure 2: Using the abess R library on a synthetic data set to demonstrate its optimality. The data set comes from a linear model with the true sparse coefficients given by beta.
  • Figure 3: Example of using the abess Python library with scikit-learn.