Accelerated Bregmann divergence optimization with SMART: an information geometry point of view

Maren Raus; Yara Elshiaty; Stefania Petra

Accelerated Bregmann divergence optimization with SMART: an information geometry point of view

Maren Raus, Yara Elshiaty, Stefania Petra

TL;DR

The exponentiated gradient method is explored, which can be viewed as a Bregman proximal gradient method and as a Riemannian gradient descent on the parameter manifold of a corresponding distribution of the exponential family.

Abstract

We investigate the problem of minimizing Kullback-Leibler divergence between a linear model $Ax$ and a positive vector $b$ in different convex domains (positive orthant, $n$-dimensional box, probability simplex). Our focus is on the SMART method that employs efficient multiplicative updates. We explore the exponentiated gradient method, which can be viewed as a Bregman proximal gradient method and as a Riemannian gradient descent on the parameter manifold of a corresponding distribution of the exponential family. This dual interpretation enables us to establish connections and achieve accelerated SMART iterates while smoothly incorporating constraints. The performance of the proposed acceleration schemes is demonstrated by large-scale numerical examples.

Accelerated Bregmann divergence optimization with SMART: an information geometry point of view

TL;DR

Abstract

We investigate the problem of minimizing Kullback-Leibler divergence between a linear model

and a positive vector

in different convex domains (positive orthant,

-dimensional box, probability simplex). Our focus is on the SMART method that employs efficient multiplicative updates. We explore the exponentiated gradient method, which can be viewed as a Bregman proximal gradient method and as a Riemannian gradient descent on the parameter manifold of a corresponding distribution of the exponential family. This dual interpretation enables us to establish connections and achieve accelerated SMART iterates while smoothly incorporating constraints. The performance of the proposed acceleration schemes is demonstrated by large-scale numerical examples.

Paper Structure (36 sections, 20 theorems, 124 equations, 12 figures, 3 tables, 4 algorithms)

This paper contains 36 sections, 20 theorems, 124 equations, 12 figures, 3 tables, 4 algorithms.

Introduction
Overview, Motivation
Related Work
Contribution, Organisation
Preliminaries
Basic Notation
Bregman Divergences and Related Definitions
Fisher-Rao Geometry
Specific Manifolds
Positive Orthant
Unit Box
Probability Simplex
Summary
SMART: Convergence and Acceleration
Bregman Proximal Gradient (BPG)
...and 21 more sections

Key Result

Proposition 2.2

Let and suppose the convex function $\varphi\colon\mathbb{R}^{n}\to\mathbb{R}\cup \{+\infty\}$ is of Legendre type. If $\mathop{\mathrm{int}}\nolimits(\mathop{\mathrm{dom}}\nolimits \varphi) \cap\mathcal{A}\neq\emptyset$, then the restriction $\varphi|_{\mathcal{A}}$ of $\varphi$ to $\mathcal{A}$ is of

Figures (12)

Figure 4.1: E-geodesics \ref{['eq:Exp-maps']} emanating from three points $\eta$ (black dots) in all directions $v=(\cos \omega,\sin \omega),\,\omega\in[0,2\pi)$ with $t\in\{0.02, 0.05, 0.1, 0.15, 0.2\}$.
Figure 5.1: Toy example: Trajectories of iterates, starting at $x_0=(0.5, 0.5)^{\top}$ and converging to the unique solution $\hat{x}=(1,1)^{\top}$, are displayed in the left panel. The projected gradient method (PG) shows a kink due to its nonsmooth projection step. Methods aware of the constraints geometry depict smoother, more optimal iterate trajectories by exploiting smooth Riemannian geometry to accommodate box constraints. In the right panel, FSMART-g stands out with larger initial steps and a rapid decrease in the relative objective value.
Figure 5.2: Comparison of CG variants on expander graphs across three instances with varying levels of ill-posedness ($m \in \{40, 70, 100\}$ from left to right). The plots illustrate the relative decrease in objective values over iterations (first row) and costly operations (second row) for different approaches detailed in \ref{['eq:beta-CG']} used to select $\beta_k$ for updating the search direction in the conjugate gradient approach. Additionally, Riemannian gradient descent with Armijo line search is considered as baseline. We observe a rough clustering of the variants into two groups: $\{$OV, PR$\}$ perform similarly to the Armijo baseline, while $\{$DY, FR, HS, HZ$\}$ exhibit notably faster convergence.
Figure 5.3: Comparison of different algorithms on expander graphs across instances with varying levels of ill-posedness ($m \in \{40, 70, 100\}$ from left to right). The plots illustrate the relative decrease in objective values over iterations (first row) and in terms of costly operations (second row) for the different approaches detailed in Section \ref{['sec:implDetails']}. Among the accelerated algorithms within the Bregman regime, FSMART-e and particularly FSMART-g demonstrate superior performance. Within the Riemannian optimization regime, the conjugate gradient (CG) implemented with the DY mode for $\beta_k$-update rule \ref{['eq:beta-Dai-Yuan']} achieves results comparable to FSMART-e.
Figure 5.4: Sparse spike reconstructions from adjacency matrices corresponding to expander graphs, generated by different algorithms, are displayed for $m=40$ after $1000$ iterations. The original binary signal $\hat{x}$ with sparsity $20$ is presented in the top left. Faster algorithms (see Figure \ref{['fig:ExperimentExpander']}) demonstrate perfect signal reconstruction. However, it is worth noting that reconstructions from slower algorithms can be thresholded entrywise at $0.5$, transforming them into perfect reconstructions.
...and 7 more figures

Theorems & Definitions (39)

Definition 2.1
Proposition 2.2: Legendre functions on affine subspaces Alvarez:2004
Theorem 2.3
Lemma 2.4: Bregman projection Bauschke:1997aa
Lemma 2.5: Three-points identity Chen_Teboulle_1993
Lemma 2.6: probability simplex, Fisher-Rao metric
proof
Lemma 2.7: Riemannian gradient in ambient coordinates
proof
Remark 2.8: Bregman kernels
...and 29 more

Accelerated Bregmann divergence optimization with SMART: an information geometry point of view

TL;DR

Abstract

Accelerated Bregmann divergence optimization with SMART: an information geometry point of view

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (39)