SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning
Jingruo Sun, Zachary Frangella, Madeleine Udell
TL;DR
SAPPHIRE tackles ill-conditioned, regularized empirical risk minimization at scale by marrying sketching-based preconditioning (SSN and NySSN) with variance-reduced gradients and a scaled proximal mapping for non-smooth penalties. The method achieves condition-number-free linear convergence under quadratic regularity and remains robust with infrequent preconditioner updates, while providing ergodic sublinear rates in broader convex settings and local linear convergence independent of conditioning. Theoretical results are complemented by extensive experiments on convex (e.g., Lasso, logistic with elastic-net) and non-convex (e.g., SCAD, MCP) penalties, showing up to 20x faster convergence than key baselines. The work offers a scalable, practical framework for large-scale statistical learning in domains with highly ill-conditioned data, such as genomics and advertising, by leveraging efficient preconditioning, variance reduction, and proximal updates.
Abstract
Regularized empirical risk minimization (rERM) has become important in data-intensive fields such as genomics and advertising, with stochastic gradient methods typically used to solve the largest problems. However, ill-conditioned objectives and non-smooth regularizers undermine the performance of traditional stochastic gradient methods, leading to slow convergence and significant computational costs. To address these challenges, we propose the $\texttt{SAPPHIRE}$ ($\textbf{S}$ketching-based $\textbf{A}$pproximations for $\textbf{P}$roximal $\textbf{P}$reconditioning and $\textbf{H}$essian $\textbf{I}$nexactness with Variance-$\textbf{RE}$educed Gradients) algorithm, which integrates sketch-based preconditioning to tackle ill-conditioning and uses a scaled proximal mapping to minimize the non-smooth regularizer. This stochastic variance-reduced algorithm achieves condition-number-free linear convergence to the optimum, delivering an efficient and scalable solution for ill-conditioned composite large-scale convex machine learning problems. Extensive experiments on lasso and logistic regression demonstrate that $\texttt{SAPPHIRE}$ often converges $20$ times faster than other common choices such as $\texttt{Catalyst}$, $\texttt{SAGA}$, and $\texttt{SVRG}$. This advantage persists even when the objective is non-convex or the preconditioner is infrequently updated, highlighting its robust and practical effectiveness.
