The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

Waïss Azizian; Franck Iutzeler; Jérôme Malick; Panayotis Mertikopoulos

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

TL;DR

This paper analyzes the last-iterate convergence rates of Bregman proximal methods for constrained, possibly non-monotone variational inequalities by introducing the Legendre exponent $\beta$, a local-geometry regulator of the prox function. It shows a sharp dichotomy: when $\beta=0$ the last-iterate converges linearly, while for $\beta>0$ the convergence is generally sublinear, with rates depending on $\beta$ as $D(x^{\ast},x_t)=O(t^{-(1-\beta)/\beta})$. In linearly constrained problems, the authors derive accelerated rates along sharp directions, with cases yielding finite-time convergence for Euclidean kernels, exponential rates for entropic kernels, and power-law rates for Tsallis-type kernels, all tied to the active constraint structure. The results illuminate how local geometry and constraint activity influence algorithmic speed, guiding kernel choice in practice and suggesting directions for future work on non-Euclidean BP methods in broader geometric settings.

Abstract

We examine the last-iterate convergence rate of Bregman proximal methods - from mirror descent to mirror-prox and its optimistic variants - as a function of the local geometry induced by the prox-mapping defining the method. For generality, we focus on local solutions of constrained, non-monotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and non-zero Legendre exponent: the former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization.

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

TL;DR

This paper analyzes the last-iterate convergence rates of Bregman proximal methods for constrained, possibly non-monotone variational inequalities by introducing the Legendre exponent

, a local-geometry regulator of the prox function. It shows a sharp dichotomy: when

the last-iterate converges linearly, while for

the convergence is generally sublinear, with rates depending on

. In linearly constrained problems, the authors derive accelerated rates along sharp directions, with cases yielding finite-time convergence for Euclidean kernels, exponential rates for entropic kernels, and power-law rates for Tsallis-type kernels, all tied to the active constraint structure. The results illuminate how local geometry and constraint activity influence algorithmic speed, guiding kernel choice in practice and suggesting directions for future work on non-Euclidean BP methods in broader geometric settings.

Abstract

Paper Structure (30 sections, 21 theorems, 117 equations, 2 figures, 2 tables)

This paper contains 30 sections, 21 theorems, 117 equations, 2 figures, 2 tables.

Introduction
Our contributions
Problem setup and preliminaries
Blanket assumptions
BP methods
Motivating examples
The Legendre exponent and convergence rate analysis
The Legendre exponent
Convergence rate analysis
Proof of \ref{['thm:general']}
Case 1: $t=1$
Case 2: $t>1$
Step 1: Stability
Step 2: Convergence rate analysis
Finer results for linearly constrained problems
...and 15 more sections

Key Result

Lemma 1

Suppose that $f\colon\mathbb{R}_+\to\mathbb{R}_+$ admits the asymptotic expansion for positive constants $\lambda,r>0$. Then, for $u_{1} > 0$ small enough, the sequence $u_{t+1} = f(u_{t})$, $t=1,2,\dotsc$, converges to $0$ at a rate of $u_{t} \sim (\lambda rt)^{-1/r}$.

Figures (2)

Figure 1: The rate of convergence of \ref{['eq:MD']} in ex:Euclex:Hell. The Euclidean and shifted Hellinger regularizers lead to a geometric rate (see left figure); all other examples converge at a polynomial rate.
Figure 2: Different boundary solution configurations on the $2$-dimensional unit simplex $\mathcal{X} = \setdef{(x_{1},x_{2},x_{3})\in\mathbb{R}_{+}^{3}}{x_{1} + x_{2} + x_{3} = 1}$ of $\mathbb{R}^{3}$: a non-extreme solution where $g$ is sharp ($\mathcal{A} = \mathcal{A}_{\sharp} = \{*\}{1}$, $\mathcal{A}_{\flat} = \varnothing$; left), an extreme solution where $g$ is not sharp ($\mathcal{A} = \{1,2\}$, $\mathcal{A}_{\sharp} = \{1\}$, $\mathcal{A}_{\flat}=\{*\}{2}$; middle), and a sharp solution ($\mathcal{A} = \mathcal{A}_{\sharp} = \{*\}{1, 2}$, $\mathcal{A}_{\flat} = \varnothing$; right).

Theorems & Definitions (57)

Definition 1: Bregman regularizers and related notions
Remark 1
Example 1: Euclidean regularization
Example 2: Entropic regularization
Lemma 1
Example 3: Fractional power
Example 4: Hellinger distance
Definition 2
Example 5: Non-compatible topologies
Theorem 1
...and 47 more

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

TL;DR

Abstract

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (57)