Table of Contents
Fetching ...

Accelerating Trust-Region Methods: An Attempt to Balance Global and Local Efficiency

Yuntian Jiang, Chuwen Zhang, Bo Jiang, Yinyu Ye

TL;DR

The paper tackles balancing global speed and local convergence in second-order convex optimization by introducing accelerated trust-region methods that exploit TR_+ with primal-dual information. It develops two variants: a Global-Local Balanced method with Local Detection achieving $ ilde{O}(\epsilon^{-1/3})$ global complexity while maintaining quadratic local convergence, and an Extragradient variant achieving near-optimal $ ilde{O}(\epsilon^{-2/7})$ global complexity at the cost of losing quadratic local convergence. The Local Detection mechanism uses the dual multiplier λ to identify local quadratic regions and trigger Newton-like steps, enabling a phase transition in which extreme global acceleration sacrifices local efficiency. Numerical experiments on regularized logistic regression corroborate the theoretical trade-offs, showing clear global benefits and varying local performance between the two variants. The work advances the design of accelerated second-order methods by integrating primal-dual TR information and region-detection into estimating-sequence frameworks.

Abstract

Historically speaking, it is hard to balance the global and local efficiency of second-order optimization algorithms. For instance, the classical Newton's method possesses excellent local convergence but lacks global guarantees, often exhibiting divergence when the starting point is far from the optimal solution~\cite{more1982newton,dennis1996numerical}. In contrast, accelerated second-order methods offer strong global convergence guarantees, yet they tend to converge with slower local rate~\cite{carmon2022optimal,chen2022accelerating,jiang2020unified}. Existing second-order methods struggle to balance global and local performance, leaving open the question of how much we can globally accelerate the second-order methods while maintaining excellent local convergence guarantee. In this paper, we tackle this challenge by proposing for the first time the accelerated trust-region-type methods, and leveraging their unique primal-dual information. Our primary technical contribution is \emph{Accelerating with Local Detection}, which utilizes the Lagrange multiplier to detect local regions and achieves a global complexity of $\tilde{O}(ε^{-1/3})$, while maintaining quadratic local convergence. We further explore the trade-off when pushing the global convergence to the limit. In particular, we propose the \emph{Accelerated Trust-Region Extragradient Method} that has a global near-optimal rate of $\tilde{O}(ε^{-2/7})$ but loses the quadratic local convergence. This reveals a phase transition in accelerated trust-region type methods: the excellent local convergence can be maintained when achieving a moderate global acceleration but becomes invalid when pursuing the extreme global efficiency. Numerical experiments further confirm the results indicated by our convergence analysis.

Accelerating Trust-Region Methods: An Attempt to Balance Global and Local Efficiency

TL;DR

The paper tackles balancing global speed and local convergence in second-order convex optimization by introducing accelerated trust-region methods that exploit TR_+ with primal-dual information. It develops two variants: a Global-Local Balanced method with Local Detection achieving global complexity while maintaining quadratic local convergence, and an Extragradient variant achieving near-optimal global complexity at the cost of losing quadratic local convergence. The Local Detection mechanism uses the dual multiplier λ to identify local quadratic regions and trigger Newton-like steps, enabling a phase transition in which extreme global acceleration sacrifices local efficiency. Numerical experiments on regularized logistic regression corroborate the theoretical trade-offs, showing clear global benefits and varying local performance between the two variants. The work advances the design of accelerated second-order methods by integrating primal-dual TR information and region-detection into estimating-sequence frameworks.

Abstract

Historically speaking, it is hard to balance the global and local efficiency of second-order optimization algorithms. For instance, the classical Newton's method possesses excellent local convergence but lacks global guarantees, often exhibiting divergence when the starting point is far from the optimal solution~\cite{more1982newton,dennis1996numerical}. In contrast, accelerated second-order methods offer strong global convergence guarantees, yet they tend to converge with slower local rate~\cite{carmon2022optimal,chen2022accelerating,jiang2020unified}. Existing second-order methods struggle to balance global and local performance, leaving open the question of how much we can globally accelerate the second-order methods while maintaining excellent local convergence guarantee. In this paper, we tackle this challenge by proposing for the first time the accelerated trust-region-type methods, and leveraging their unique primal-dual information. Our primary technical contribution is \emph{Accelerating with Local Detection}, which utilizes the Lagrange multiplier to detect local regions and achieves a global complexity of , while maintaining quadratic local convergence. We further explore the trade-off when pushing the global convergence to the limit. In particular, we propose the \emph{Accelerated Trust-Region Extragradient Method} that has a global near-optimal rate of but loses the quadratic local convergence. This reveals a phase transition in accelerated trust-region type methods: the excellent local convergence can be maintained when achieving a moderate global acceleration but becomes invalid when pursuing the extreme global efficiency. Numerical experiments further confirm the results indicated by our convergence analysis.

Paper Structure

This paper contains 21 sections, 39 theorems, 204 equations, 1 figure, 2 algorithms.

Key Result

Lemma 2.5

If $f:\mathbb{R}^n \mapsto \mathbb{R}$ satisfies Assumption assm.lipschitz, then for all $x,y\in \mathbb{R}^n$, we have

Figures (1)

  • Figure 1: Logistic regression using the LIBSVM datasets

Theorems & Definitions (71)

  • Definition 2.1
  • Lemma 2.5: Lemma 4.1.1, nesterov_lectures_2018
  • Lemma 2.6
  • Lemma 2.7
  • Lemma 2.8: Section 3, conn2000trust
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • ...and 61 more