Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization

Adeyemi D. Adeoye; Alberto Bemporad

Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization

Adeyemi D. Adeoye, Alberto Bemporad

TL;DR

The paper develops a self-concordant smoothing framework for convex composite optimization, replacing a nonsmooth g with a self-concordant smoothing g_s via infimal convolution to yield a diagonal variable metric and an adaptive step-length rule. Building on this, two proximal-quasi-Newton algorithms, Prox-N-SCORE and Prox-GGN-SCORE, are proposed to efficiently solve large-scale problems with structured penalties, leveraging a low-rank Hessian-inverse approach in the GGN variant. The authors establish global and local convergence guarantees under standard assumptions and validate the approach through numerical experiments on sparse logistic regression, sparse-group lasso, and sparse deconvolution, with a public Julia implementation. The framework preserves problem structure while improving convergence speed and scalability, making it well-suited for high-dimensional machine learning and signal-processing tasks with nonsmooth regularizers.

Abstract

We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other nonsmooth. The key highlight is a natural property of the resulting problem's structure that yields a variable metric selection method and a step length rule especially suited to proximal quasi-Newton algorithms. Also, we efficiently handle specific structures promoted by the nonsmooth term, such as l1-regularization and group lasso penalties. A convergence analysis for the class of proximal quasi-Newton methods covered by our framework is presented. In particular, we obtain guarantees, under standard assumptions, for two algorithms: Prox-N-SCORE (a proximal Newton method) and Prox-GGN-SCORE (a proximal generalized Gauss-Newton method). The latter uses a low-rank approximation of the Hessian inverse, reducing most of the cost of matrix inversion and making it effective for overparameterized machine learning models. Numerical experiments on synthetic and real data demonstrate the efficiency of both algorithms against state-of-the-art approaches. A Julia implementation is publicly available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.

Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization

TL;DR

Abstract

Paper Structure (16 sections, 16 theorems, 108 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 16 theorems, 108 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Notation and preliminaries
Self-concordant regularization
Self-concordant regularization via infimal convolution
A proximal quasi-Newton scheme
Variable metric and adaptive step length selection
A proximal generalized Gauss-Newton algorithm
Structured penalties
Structure reformulation for self-concordant smoothing
Prox-decomposition and smoothness properties
Convergence analysis
Numerical experiments
Sparse logistic regression
Sparse-group lasso
Sparse deconvolution
...and 1 more sections

Key Result

lemma 1

Let $\phi \in \mathop{\Gamma_0({\mathbb R})}\nolimits$ be a function from ${\mathbb R}$ to ${\mathbb R} \cup \{+\infty\}$, and let the function $\mathop{h\colon{\mathbb R}^n\to {\mathbb R} \cup \{+\infty\}}\nolimits$ be defined by $h(x) \coloneqq \sum_{i=1}^{n} \lambda_i\phi(x^{(i)})$ with $x^{(i)}\

Figures (8)

Figure 1: Generalized self-concordant smoothing of $\|\cdot\|_1$ with $\phi(t) = \sqrt{1+\abs{t}^2}-1$ (left) and $\phi(t) = \frac{1}{2}\left[\sqrt{1+4t^2}-1+\log\left(\frac{\sqrt{1+4t^2}-1}{2t^2}\right)\right]$ (right). The smooth approximation is shown for $\mu=0.2,0.5,1.0$.
Figure 2: Behaviour of Prox-N-SCORE and Prox-GGN-SCORE for different fixed values of $\alpha_k$ in problem \ref{['eq:logexample']}.
Figure 3: Overparameterized problem (first row) and non-overparameterized problems (second row) in \ref{['eq:logexample']}. Prox-GGN-SCORE reduces most of the computational burden of Prox-N-SCORE if $m+n_y < n$ (or $m\ll n$). However, Prox-N-SCORE solves the problem faster, and is more stable, if $n < m+n_y$ (or $n\ll m$).
Figure 4: Performance profile (CPU time) for the sparse logistic regression problem \ref{['eq:logexample']} using the LIBSVM datasets summarized in Table\ref{['tab:data-summary']}. Here, $\tau$ denotes the performance ratio (CPU times in seconds) averaged over 20 independent runs with different random initializations, and $\rho(\tau)$ is the corresponding frequency.
Figure 5: Performance of Prox-GGN-SCORE (alg.A), SSNAL (alg.B), Prox-Grad (alg.C) and BCD (alg.D) on the sparse-group lasso problem \ref{['eq:sgl-example']} for different values of $m$ and $n$. nnz stands for the number of nonzero entries of $x^\star$ and of the solutions found by the algorithms. MSE stands for the mean squared error between the true solution $x^\star$ and the estimated solutions.
...and 3 more figures

Theorems & Definitions (37)

definition 1: Generalized self-concordant function on ${\mathbb R}$
definition 2: Generalized self-concordant function on ${\mathbb R}^n$ of order $\nu$
definition 3: Self-concordant smoothing function
definition 4: Infimal convolution
definition 5: Inf-conv regularization
definition 6: Scaled proximal operator
remark 1
remark 2
lemma 1
proof
...and 27 more

Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization

TL;DR

Abstract

Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (37)