Table of Contents
Fetching ...

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

Zhuanghua Liu, Luo Luo, Bryan Kian Hsiang Low

TL;DR

This work tackles large-scale finite-sum convex optimization by introducing Lazy Incremental Symmetric Rank-1 (LISR-1) and its block extension LISR-$k$, which apply symmetric rank-1 updates in an incremental setting to estimate curvature. The algorithms maintain low per-iteration cost through lazy updates and a cyclic update scheme, achieving a condition-number-free local superlinear convergence rate, namely $\mathcal{O}((1-d^{-1})^{\lceil t/n\rceil^2})$ for LISR-1 and $\mathcal{O}((1-k/d)^{\lceil t/n\rceil^2})$ for LISR-$k$, with $O(1)$ gradient/Hessian-vector oracle calls and $O(d^2)$ flops per iteration. Theoretical analysis leverages a Hessian-approximation metric $\nu(\cdot,\cdot)$ under Broyden-family updates and greedy SR1 updates to bound convergence and approximation errors. Empirical results on quadratic programs and regularized logistic regression show that the proposed methods outperform baseline IQN variants, validating both the theoretical rates and practical efficiency, with the block variant offering further acceleration at modest additional cost.

Abstract

We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate that is dependent on the condition number of the problem. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework, which results in the condition-number-free local superlinear convergence rate. Furthermore, we can boost our method by applying the block update on the Hessian approximation, which leads to an even faster local convergence rate. The numerical experiments show the proposed methods significantly outperform the baseline methods.

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

TL;DR

This work tackles large-scale finite-sum convex optimization by introducing Lazy Incremental Symmetric Rank-1 (LISR-1) and its block extension LISR-, which apply symmetric rank-1 updates in an incremental setting to estimate curvature. The algorithms maintain low per-iteration cost through lazy updates and a cyclic update scheme, achieving a condition-number-free local superlinear convergence rate, namely for LISR-1 and for LISR-, with gradient/Hessian-vector oracle calls and flops per iteration. Theoretical analysis leverages a Hessian-approximation metric under Broyden-family updates and greedy SR1 updates to bound convergence and approximation errors. Empirical results on quadratic programs and regularized logistic regression show that the proposed methods outperform baseline IQN variants, validating both the theoretical rates and practical efficiency, with the block variant offering further acceleration at modest additional cost.

Abstract

We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate that is dependent on the condition number of the problem. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework, which results in the condition-number-free local superlinear convergence rate. Furthermore, we can boost our method by applying the block update on the Hessian approximation, which leads to an even faster local convergence rate. The numerical experiments show the proposed methods significantly outperform the baseline methods.
Paper Structure (35 sections, 16 theorems, 103 equations, 5 figures, 2 tables, 4 algorithms)

This paper contains 35 sections, 16 theorems, 103 equations, 5 figures, 2 tables, 4 algorithms.

Key Result

Lemma 4.1

The iteration formula (cur_iter_update) satisfies for all $t \geq 1$, where $\Gamma^t \coloneqq \|(\sum_{i=1}^n B_i^t)^{-1}\|$.

Figures (5)

  • Figure 1: Normalized error vs. the number of effective passes for the quadratic programming problem.
  • Figure 2: Normalized error vs. the number of effective passes for the regularized logistic regression problem on several real-world datasets .
  • Figure 3: Comparison of the proposed methods with baselines for the quadratic function minimization problem.
  • Figure 4: Comparison of the LISR-$k$ method with different choices of $k$ for the general function minimization.
  • Figure 5: Comparison of the LISR-$k$ method with different choices of $k$ for the general function minimization.

Theorems & Definitions (29)

  • Definition 3.3
  • Definition 3.4
  • Remark 3.5
  • Lemma 4.1
  • Remark 4.2
  • Lemma 4.3
  • Remark 4.4
  • Lemma 4.5
  • Theorem 4.6
  • Theorem 5.1
  • ...and 19 more