Adaptive sieving: A dimension reduction technique for sparse optimization problems

Yancheng Yuan; Meixia Lin; Defeng Sun; Kim-Chuan Toh

Adaptive sieving: A dimension reduction technique for sparse optimization problems

Yancheng Yuan, Meixia Lin, Defeng Sun, Kim-Chuan Toh

TL;DR

The paper introduces adaptive sieving (AS), a dimension-reduction strategy for sparse convex composite problems of the form $\min_x \{\Phi(x)+P(x)\}$, enabling solution of sequences of drastically smaller reduced problems by iteratively pruning inactive features. AS is agnostic to the specific regularizer, relies on proximal mappings, and allows inexact solves, with a proximal residual $R(x)$ guiding termination. The authors provide a finite-termination analysis and develop an AS-based path-generation method for solving $\min_x \{\Phi(x)+\lambda P(x)\}$ across a grid of $\lambda$ values, yielding approximate solutions with controlled residuals. Extensive numerical experiments on synthetic and real data demonstrate substantial accelerations (often tens of times) and robust performance across linear and logistic regression models, including Lasso, SLOPE, exclusive lasso, and sparse group lasso. The results highlight AS’s practical impact for large-scale sparse learning and model selection via efficient solution-path generation.

Abstract

In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller sizes need to be solved. We further apply the proposed AS strategy to generate solution paths for large-scale sparse optimization problems efficiently. We establish the theoretical guarantees for the proposed AS strategy including its finite termination property. Extensive numerical experiments are presented in this paper to demonstrate the effectiveness and flexibility of the AS strategy to solve large-scale machine learning models.

Adaptive sieving: A dimension reduction technique for sparse optimization problems

TL;DR

The paper introduces adaptive sieving (AS), a dimension-reduction strategy for sparse convex composite problems of the form

, enabling solution of sequences of drastically smaller reduced problems by iteratively pruning inactive features. AS is agnostic to the specific regularizer, relies on proximal mappings, and allows inexact solves, with a proximal residual

guiding termination. The authors provide a finite-termination analysis and develop an AS-based path-generation method for solving

across a grid of

values, yielding approximate solutions with controlled residuals. Extensive numerical experiments on synthetic and real data demonstrate substantial accelerations (often tens of times) and robust performance across linear and logistic regression models, including Lasso, SLOPE, exclusive lasso, and sparse group lasso. The results highlight AS’s practical impact for large-scale sparse learning and model selection via efficient solution-path generation.

Abstract

Paper Structure (27 sections, 3 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 3 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms.

Introduction
Dimension reduction via adaptive sieving
The adaptive sieving strategy
Examples of the regularizer
Theoretical analysis of the AS strategy
An efficient path generation method based on the AS strategy
Numerical Experiments
Performance of the AS strategy for linear regression on synthetic data
Numerical results on Lasso linear regression problems
Numerical results on exclusive lasso linear regression problems
Performance of the AS strategy for logistic regression on synthetic data
Numerical results on Lasso logistic regression problems
Numerical results on exclusive lasso logistic regression problems
Comparison between AS and other approaches
Flexibility of the AS strategy
...and 12 more sections

Key Result

Proposition 1

Given any $s = 0,1,\cdots$. The updating rule of $x^s$ in Algorithm alg:screening can be interpreted in the procedure as follows. Let $M_s$ be a linear map from $\mathbb{R}^{|I^s|}$ to $\mathbb{R}^n$ defined as and $\Phi^s$, $P^s$ be functions from $\mathbb{R}^{|I^s|}$ to $\mathbb{R}$ defined as $\Phi^s(z):= \Phi(M_s z)$, $P^s(z) := P(M_s z)$ for all $z\in \mathbb{R}^{|I^s|}$. Then $x^s\in \mathb

Figures (8)

Figure 1: Performance of the AS strategy applied to the generation of solution paths for the Lasso linear regression model on synthetic data sets with different problem sizes.
Figure 2: Performance profile of the AS strategy on the Lasso linear regression model for the case when $m=500,n=100,000$.
Figure 3: Performance of the AS strategy applied to the generation of solution paths for the exclusive lasso linear regression model on synthetic data sets with different problem sizes.
Figure 4: Performance profile of the AS strategy on the exclusive lasso linear regression model for the case when $m=500,n=100,000$.
Figure 5: Comparison among different approaches in generating solution paths for the Lasso model of size $(m,n)=(500,100000)$ with $\lambda_c$ decreasing from $1$ to $10^{-4}$.
...and 3 more figures

Theorems & Definitions (5)

Proposition 1
proof
Theorem 1
proof
Theorem 2

Adaptive sieving: A dimension reduction technique for sparse optimization problems

TL;DR

Abstract

Adaptive sieving: A dimension reduction technique for sparse optimization problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)