Table of Contents
Fetching ...

Adaptive sieving: A dimension reduction technique for sparse optimization problems

Yancheng Yuan, Meixia Lin, Defeng Sun, Kim-Chuan Toh

TL;DR

The paper introduces adaptive sieving (AS), a dimension-reduction strategy for sparse convex composite problems of the form $\min_x \{\Phi(x)+P(x)\}$, enabling solution of sequences of drastically smaller reduced problems by iteratively pruning inactive features. AS is agnostic to the specific regularizer, relies on proximal mappings, and allows inexact solves, with a proximal residual $R(x)$ guiding termination. The authors provide a finite-termination analysis and develop an AS-based path-generation method for solving $\min_x \{\Phi(x)+\lambda P(x)\}$ across a grid of $\lambda$ values, yielding approximate solutions with controlled residuals. Extensive numerical experiments on synthetic and real data demonstrate substantial accelerations (often tens of times) and robust performance across linear and logistic regression models, including Lasso, SLOPE, exclusive lasso, and sparse group lasso. The results highlight AS’s practical impact for large-scale sparse learning and model selection via efficient solution-path generation.

Abstract

In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller sizes need to be solved. We further apply the proposed AS strategy to generate solution paths for large-scale sparse optimization problems efficiently. We establish the theoretical guarantees for the proposed AS strategy including its finite termination property. Extensive numerical experiments are presented in this paper to demonstrate the effectiveness and flexibility of the AS strategy to solve large-scale machine learning models.

Adaptive sieving: A dimension reduction technique for sparse optimization problems

TL;DR

The paper introduces adaptive sieving (AS), a dimension-reduction strategy for sparse convex composite problems of the form , enabling solution of sequences of drastically smaller reduced problems by iteratively pruning inactive features. AS is agnostic to the specific regularizer, relies on proximal mappings, and allows inexact solves, with a proximal residual guiding termination. The authors provide a finite-termination analysis and develop an AS-based path-generation method for solving across a grid of values, yielding approximate solutions with controlled residuals. Extensive numerical experiments on synthetic and real data demonstrate substantial accelerations (often tens of times) and robust performance across linear and logistic regression models, including Lasso, SLOPE, exclusive lasso, and sparse group lasso. The results highlight AS’s practical impact for large-scale sparse learning and model selection via efficient solution-path generation.

Abstract

In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller sizes need to be solved. We further apply the proposed AS strategy to generate solution paths for large-scale sparse optimization problems efficiently. We establish the theoretical guarantees for the proposed AS strategy including its finite termination property. Extensive numerical experiments are presented in this paper to demonstrate the effectiveness and flexibility of the AS strategy to solve large-scale machine learning models.
Paper Structure (27 sections, 3 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 3 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms.

Key Result

Proposition 1

Given any $s = 0,1,\cdots$. The updating rule of $x^s$ in Algorithm alg:screening can be interpreted in the procedure as follows. Let $M_s$ be a linear map from $\mathbb{R}^{|I^s|}$ to $\mathbb{R}^n$ defined as and $\Phi^s$, $P^s$ be functions from $\mathbb{R}^{|I^s|}$ to $\mathbb{R}$ defined as $\Phi^s(z):= \Phi(M_s z)$, $P^s(z) := P(M_s z)$ for all $z\in \mathbb{R}^{|I^s|}$. Then $x^s\in \mathb

Figures (8)

  • Figure 1: Performance of the AS strategy applied to the generation of solution paths for the Lasso linear regression model on synthetic data sets with different problem sizes.
  • Figure 2: Performance profile of the AS strategy on the Lasso linear regression model for the case when $m=500,n=100,000$.
  • Figure 3: Performance of the AS strategy applied to the generation of solution paths for the exclusive lasso linear regression model on synthetic data sets with different problem sizes.
  • Figure 4: Performance profile of the AS strategy on the exclusive lasso linear regression model for the case when $m=500,n=100,000$.
  • Figure 5: Comparison among different approaches in generating solution paths for the Lasso model of size $(m,n)=(500,100000)$ with $\lambda_c$ decreasing from $1$ to $10^{-4}$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2