Table of Contents
Fetching ...

Non-Convex Self-Concordant Functions: Practical Algorithms and Complexity Analysis

Donald Goldfarb, Lexiao Lai, Tianyi Lin, Jiayu Zhang

TL;DR

The paper extends self-concordance to non-convex optimization by introducing two function classes—(κ,ℓ)-weakly self-concordant and F-based κ-self-concordant—and develops two second-order algorithms, RNM and ARM, that guarantee convergence to first-order stationary points and, with negative curvature detection, to second-order stationary points. It provides a descent framework (descent inequality) and rigorous convergence analyses, showing $O(ε^{-2})$ iteration complexity for first-order guarantees and additional rates under curvature assumptions. The framework is demonstrated on practical nonconvex problems, including generalized phase retrieval, sparse dictionary learning, and nonnegative matrix factorization, and is validated with experiments on NMF and CNN training, where ARM and KFAC-based variants show improved robustness and efficiency over cubic regularization and trust-region methods. The work highlights the broad potential of self-concordant regularization for scalable, robust optimization in large-scale machine learning and related domains.

Abstract

We extend the standard notion of self-concordance to non-convex optimization and develop a family of second-order algorithms with global convergence guarantees. In particular, two function classes -- \textit{weakly self-concordant} functions and \textit{$F$-based self-concordant} functions -- generalize the self-concordant framework beyond convexity, without assuming the Lipschitz continuity of the gradient or Hessian. For these function classes, we propose a regularized Newton method and an adaptive regularization method that achieve an $ε$-approximate first-order stationary point in $O(ε^{-2})$ iterations. Equipped with an oracle capable of detecting negative curvature, the adaptive algorithm can further attain convergence to an approximate second-order stationary point. Our experimental results demonstrate that the proposed methods offer superior robustness and computational efficiency compared to cubic regularization and trust-region approaches, underscoring the broad potential of self-concordant regularization for large-scale and neural network optimization problems.

Non-Convex Self-Concordant Functions: Practical Algorithms and Complexity Analysis

TL;DR

The paper extends self-concordance to non-convex optimization by introducing two function classes—(κ,ℓ)-weakly self-concordant and F-based κ-self-concordant—and develops two second-order algorithms, RNM and ARM, that guarantee convergence to first-order stationary points and, with negative curvature detection, to second-order stationary points. It provides a descent framework (descent inequality) and rigorous convergence analyses, showing iteration complexity for first-order guarantees and additional rates under curvature assumptions. The framework is demonstrated on practical nonconvex problems, including generalized phase retrieval, sparse dictionary learning, and nonnegative matrix factorization, and is validated with experiments on NMF and CNN training, where ARM and KFAC-based variants show improved robustness and efficiency over cubic regularization and trust-region methods. The work highlights the broad potential of self-concordant regularization for scalable, robust optimization in large-scale machine learning and related domains.

Abstract

We extend the standard notion of self-concordance to non-convex optimization and develop a family of second-order algorithms with global convergence guarantees. In particular, two function classes -- \textit{weakly self-concordant} functions and \textit{-based self-concordant} functions -- generalize the self-concordant framework beyond convexity, without assuming the Lipschitz continuity of the gradient or Hessian. For these function classes, we propose a regularized Newton method and an adaptive regularization method that achieve an -approximate first-order stationary point in iterations. Equipped with an oracle capable of detecting negative curvature, the adaptive algorithm can further attain convergence to an approximate second-order stationary point. Our experimental results demonstrate that the proposed methods offer superior robustness and computational efficiency compared to cubic regularization and trust-region approaches, underscoring the broad potential of self-concordant regularization for large-scale and neural network optimization problems.

Paper Structure

This paper contains 26 sections, 21 theorems, 69 equations, 1 figure, 1 table, 4 algorithms.

Key Result

Proposition 2.4

Let $\alpha_1,\alpha_2 > 0$. If $f_i$ is $F_i$-based $\kappa_i$-self-concordant for $i=1,2$, then $\alpha_1f_1+\alpha_2f_2$ is $(\alpha_1F_1+\alpha_2F_2)$-based $\max(\frac{\kappa_1}{\sqrt{\alpha_1}}, \frac{\kappa_2}{\sqrt{\alpha_2}})$-self-concordant.

Figures (1)

  • Figure 1: Performance comparison on NMF with MSE loss (left) and KL divergence (right).

Theorems & Definitions (47)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Proposition 2.4
  • proof
  • Proposition 2.5
  • proof
  • Proposition 2.6
  • proof
  • Theorem 3.2
  • ...and 37 more