Table of Contents
Fetching ...

Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation

Tianqi Qiao, Marie Maros

TL;DR

The paper tackles high-dimensional M-estimation under sparsity by generalizing Sparse Polyak to use adaptive step-sizes with a broad class of sparsifying operators. By selecting operators with bounded relative concavity $\eta_{s^*}(\Phi_s)$, the method achieves contractive convergence and preserves dimension-invariant rates; in particular, employing Reciprocal Thresholding (RT) reduces the required sparsity from $s=O(s^*\bar{\kappa}^2)$ to $s=O(s^*\bar{\kappa})$ and improves final accuracy by a factor of $\bar{\kappa}$. Theoretical results establish a contractive bound $\|\theta_{t+1}-\widehat{\theta}\|^2 \le (1-1/(40\bar{\kappa})+4\eta_{s^*}(\Phi_s))\|\theta_t-\widehat{\theta}\|^2$, with corollaries for sparse linear and GLMs showing near-optimal statistical precision independent of $d$. Numerical experiments on sparse logistic regression demonstrate faster convergence and sparser solutions with RT, validating the approach's scalability to very high-dimensional problems in GLMs.

Abstract

We propose and analyze a variant of Sparse Polyak for high dimensional M-estimation problems. Sparse Polyak proposes a novel adaptive step-size rule tailored to suitably estimate the problem's curvature in the high-dimensional setting, guaranteeing that the algorithm's performance does not deteriorate when the ambient dimension increases. However, convergence guarantees can only be obtained by sacrificing solution sparsity and statistical accuracy. In this work, we introduce a variant of Sparse Polyak that retains its desirable scaling properties with respect to the ambient dimension while obtaining sparser and more accurate solutions.

Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation

TL;DR

The paper tackles high-dimensional M-estimation under sparsity by generalizing Sparse Polyak to use adaptive step-sizes with a broad class of sparsifying operators. By selecting operators with bounded relative concavity , the method achieves contractive convergence and preserves dimension-invariant rates; in particular, employing Reciprocal Thresholding (RT) reduces the required sparsity from to and improves final accuracy by a factor of . Theoretical results establish a contractive bound , with corollaries for sparse linear and GLMs showing near-optimal statistical precision independent of . Numerical experiments on sparse logistic regression demonstrate faster convergence and sparser solutions with RT, validating the approach's scalability to very high-dimensional problems in GLMs.

Abstract

We propose and analyze a variant of Sparse Polyak for high dimensional M-estimation problems. Sparse Polyak proposes a novel adaptive step-size rule tailored to suitably estimate the problem's curvature in the high-dimensional setting, guaranteeing that the algorithm's performance does not deteriorate when the ambient dimension increases. However, convergence guarantees can only be obtained by sacrificing solution sparsity and statistical accuracy. In this work, we introduce a variant of Sparse Polyak that retains its desirable scaling properties with respect to the ambient dimension while obtaining sparser and more accurate solutions.

Paper Structure

This paper contains 11 sections, 4 theorems, 26 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\{\theta_t\}_{t \geq 1}$ denote the iterates generated by Algorithm algo:iht. Suppose $f$ is convex and satisfies Assumptions asp:rscvx and asp:rsmooth. Let $\widehat{\theta}$ be any $s^{*}$-sparse vector such that $f(\widehat{\theta}) = \widehat{f}$. Assume $s/s^{*}$ is sufficiently large to g If Assumption asp:weak holds instead of Assumption asp:rscvx, set $\gamma_t = \frac{\max\{f(\theta_

Figures (1)

  • Figure 1: We use dashed line for HT operator with $s=500$, and solid line for RT operator with $s=300$.

Theorems & Definitions (5)

  • Theorem 1
  • Corollary 1
  • Corollary 2: Linear Regression
  • Corollary 3: Generalized Linear Models
  • proof