Dynamic Incremental Optimization for Best Subset Selection

Shaogang Ren; Xiaoning Qian

Dynamic Incremental Optimization for Best Subset Selection

Shaogang Ren, Xiaoning Qian

TL;DR

The paper addresses generalized best subset selection with non-convex \ell_0 regularization by deriving a dual formulation and solving it via a primal–dual algorithm. It introduces dual-range estimation and an active incremental strategy that employs safe screening to prune inactive features and gradually grow the active set, achieving substantial computational savings. The authors establish strong duality, analyze convergence and polynomial-time complexity, and demonstrate competitive or improved efficiency on synthetic linear-regression data and the News20 dataset. The approach yields higher-quality sparse estimators with significantly reduced computational cost in high-dimensional sparse learning tasks.

Abstract

Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

Dynamic Incremental Optimization for Best Subset Selection

TL;DR

Abstract

-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

Paper Structure (21 sections, 10 theorems, 102 equations, 4 figures, 3 algorithms)

This paper contains 21 sections, 10 theorems, 102 equations, 4 figures, 3 algorithms.

Introduction
Properties of Generalized Sparse Learning
Dual Problem
Dual Variable Estimation
Algorithm
Primal-dual Updating for Linear Regression
Improve Efficiency with Active Incremental Strategy
Algorithm Analysis
Experiments
Simulation Study
News20 Dataset
Discussion
Conclusion
Additional Remarks
Outer Loop Analysis
...and 6 more sections

Key Result

Theorem 2.1

Assume that the primal loss functions $\{l_i(\cdot)\}_{i=1}^n$ are $1/\mu$-strongly smooth. The range of the dual variable is bounded via the duality gap value, i.e., $\forall \alpha \in \mathcal{F}^n, \beta \in \mathbb{R}^p$, $\{B(\alpha; r) : || \alpha - \bar{\alpha}||_2 \leq r, r = \sqrt{ \frac

Figures (4)

Figure 1: Running time, duality gap, nonzero number, PSSR, and estimation error for four algorithms on simulated data with $SNR \in \{20, 5, 2\}$. x-axis is the number of training samples, from 200 to 600.
Figure 2: The running time, primal objective $P(\hat{\beta})$, duality gap, and nonzero number for different methods on News20 dataset.
Figure : Inner solver with primal-dual updating
Figure : Feature Inclusion

Theorems & Definitions (25)

Definition 2.1
Theorem 2.1
Theorem 4.1
Remark 4.1
Lemma B.1
proof
Theorem B.2
proof
Remark B.1
proof
...and 15 more

Dynamic Incremental Optimization for Best Subset Selection

TL;DR

Abstract

Dynamic Incremental Optimization for Best Subset Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (25)