GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

Jiachang Liu; Andrea Lodi; Soroosh Shafiee

GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

Jiachang Liu, Andrea Lodi, Soroosh Shafiee

TL;DR

This analysis links primal quadratic growth to dual quadratic decay, yielding error bounds that make the Fenchel duality gap a sharp proxy for progress towards the solution set, and develops a duality gap-based restart scheme that upgrades a broad class of sublinear proximal methods to provably linearly convergent methods.

Abstract

We investigate the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through a cardinality constraint. While Branch-and-Bound (BnB) frameworks can certify optimality using perspective relaxations, existing methods for solving these relaxations are computationally intensive, limiting their scalability. To address this challenge, we reformulate the relaxations as composite optimization problems and develop a unified proximal framework that is both linearly convergent and computationally efficient. Under specific geometric regularity conditions, our analysis links primal quadratic growth to dual quadratic decay, yielding error bounds that make the Fenchel duality gap a sharp proxy for progress towards the solution set. This leads to a duality gap-based restart scheme that upgrades a broad class of sublinear proximal methods to provably linearly convergent methods, and applies beyond the sparse GLM setting. For the implicit perspective regularizer, we further derive specialized routines to evaluate the regularizer and its proximal operator exactly in log-linear time, avoiding costly generic conic solvers. The resulting iterations are dominated by matrix--vector multiplications, which enables GPU acceleration. Experiments on synthetic and real-world datasets show orders-of-magnitude faster dual-bound computations and substantially improved BnB scalability on large instances.

GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

TL;DR

Abstract

Paper Structure (61 sections, 19 theorems, 114 equations, 23 figures, 15 tables, 3 algorithms)

This paper contains 61 sections, 19 theorems, 114 equations, 23 figures, 15 tables, 3 algorithms.

Introduction
Related Works
MIP for ML.
Perspective Formulations.
Lower Bound Calculation.
Proximal Gradient Methods
Relationship to Dual-Based Strategies.
GPU Acceleration.
Preliminaries and Problem Formulation
Notation
Convex Analysis Background
Sparse GLMs, Branch-and-Bound, and Perspective Relaxations
Composite Reformulation and Fenchel Duality
A Linearly Convergent Algorithmic Framework
Structural Geometry
...and 46 more sections

Key Result

Lemma 3.3

Under Assumptions assumption:F:G and assumption:standard_fenchel_duality_assumption, the primal and dual problems are solvable and strong duality holds, that is, $\Phi^\star = \Psi^\star$. Moreover, the dual problem admits a unique optimal solution.

Figures (23)

Figure 1: Illustration of quadratic growth for the primal objective $\Phi(\bm{\beta})$ and quadratic decay for the dual objective $\Psi(\bm{\zeta})$.
Figure 2: 3D plots of $g_{\mathcal{N}}$ (left) and $g_{\mathcal{N}}^*$ (right) at the root node ($p=2$, $k=1$, $M=1$). Left: red diamonds mark the boundary $\lvert \beta_1 \rvert + \lvert \beta_2 \rvert = M$ of $\mathop{\mathrm{dom}}\nolimits(g_{\mathcal{N}})$ (top at $g_{\mathcal{N}}(\bm{\beta})=\frac{1}{2}M^2$, bottom projection). Right: green squares mark the transition set $\max(\lvert \alpha_1 \rvert, \lvert \alpha_2 \rvert)=M$ (top at $g_{\mathcal{N}}^*(\bm{\alpha})=\frac{1}{2}M^2$, bottom projection); $g_{\mathcal{N}}^*$ grows quadratically inside and linearly outside.
Figure 3: A vector $\bm{\omega}$ majorizes $\bm{\beta}$ if and only if $\bm{\beta} \in \mathop{\mathrm{conv}}\nolimits(\mathrm{perm}(\bm{\omega}))$. Left ($p=2$, $k=1$): the hull is the segment between $(2,0)$ and $(0,2)$, containing $\bm{\beta}=(1.4,0.6)$. Right ($p=3$, $k=2$): the hull is the triangle with vertices $(1.5,1.5,0)$, $(1.5,0,1.5)$, and $(0,1.5,1.5)$, containing $\bm{\beta}=(0.6,1.2,1.2)$. Given $\bm{\beta}$, Algorithm \ref{['alg:compute_g_value_root_node_algorithm']} constructs such a $k$-sparse vector $\bm{\omega}$.
Figure 4: Running time comparison of evaluating $g_{\mathcal{N}}$ and the proximal operators of $g_{\mathcal{N}}$ and $g_{\mathcal{N}}^*$, where $\mathcal{N}$ is the root node. The baselines solve the corresponding SOCPs directly.
Figure 5: Running time comparison of solving Problem \ref{['obj:perspective_relaxation']}, the perspective relaxation at the root node. We set $M=2.0$, $\lambda_2=1.0$, and $n$-to-$p$ ratio to be 1. Gurobi cannot solve the relaxation of the cardinality constrained logistic regression problem.
...and 18 more figures

Theorems & Definitions (37)

Lemma 3.3
Theorem 3.5
Lemma 3.6
Lemma 3.7
Theorem 3.9
Proposition 3.11
Theorem 3.12
Theorem 3.13
Lemma 4.1
Lemma 4.2
...and 27 more

GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

TL;DR

Abstract

GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (37)