Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

Ségolène Martin; Jean-Christophe Pesquet; Gabriele Steidl; Ismail Ben Ayed

Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

Ségolène Martin, Jean-Christophe Pesquet, Gabriele Steidl, Ismail Ben Ayed

TL;DR

This work introduces the Variable Bregman Majorization-Minimization (VBMM) framework for convex minimization of F(x)=f(x)+g(x), where f is differentiable on an open set and g is convex lsc. By allowing the Bregman metric to adapt at each iteration, VBMM aims to approximate the objective more accurately and accelerate convergence beyond traditional Bregman Proximal Gradient methods. The authors prove subsequential convergence to minimizers under mild metric assumptions and apply VBMM to Dirichlet maximum likelihood estimation, deriving a separable, closed-form Majorization-Minimization update and showing existence of Bregman majorants. Numerical experiments on high-dimensional Dirichlet problems indicate that VBMM outperforms existing Newton-type and fixed-metric approaches, both in unconstrained and constrained settings. The results highlight VBMM’s practical potential for fast, robust convex optimization and parameter estimation in statistical models.

Abstract

We propose a novel Bregman descent algorithm for minimizing a convex function that is expressed as the sum of a differentiable part (defined over an open set) and a possibly nonsmooth term. The approach, referred to as the Variable Bregman Majorization-Minimization (VBMM) algorithm, extends the Bregman Proximal Gradient method by allowing the Bregman function used in the divergence to adaptively vary at each iteration, provided it satisfies a majorizing condition on the objective function. This adaptive framework enables the algorithm to approximate the objective more precisely at each iteration, thereby allowing for accelerated convergence compared to the traditional Bregman Proximal Gradient descent. We establish the convergence of the VBMM algorithm to a minimizer under mild assumptions on the family of metrics used. Furthermore, we introduce a novel application of both the Bregman Proximal Gradient method and the VBMM algorithm to the estimation of the multidimensional parameters of a Dirichlet distribution through the maximization of its log-likelihood. Numerical experiments confirm that the VBMM algorithm outperforms existing approaches in terms of convergence speed.

Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

TL;DR

Abstract

Paper Structure (14 sections, 8 theorems, 80 equations, 3 figures, 3 algorithms)

This paper contains 14 sections, 8 theorems, 80 equations, 3 figures, 3 algorithms.

Introduction
Problem and algorithm
Convergence analysis
Application to the estimation of the parameter of a Dirichlet distribution
Maximum likelihood estimation for a Dirichlet distribution
Existence of a unique minimizer
Existence of Bregman majorants
Algorithm
Numerical experiments
Unconstrained case
Experimental setup
Results
Use of a separable constraint
Conclusion

Key Result

Proposition 2.1

Under Assumptions assumption:f_and_g and assumption:bregman_functions, Algorithm algo:VBMM is well-defined and generates a sequence $(\boldsymbol{x}^{(\ell)})_{\ell \ge 1}$ in $\text{\normalfont dom}\, g$.

Figures (3)

Figure 1: Curvature function \ref{['eq:defcourb']} when $\varphi=\ln\Gamma(\cdot+1)$.
Figure 2: Distance to optimum versus time for different values of $\boldsymbol{\alpha}_\text{true}$, with $M=500$ samples and sample size $d=1000$. Rows from top to bottom: $\boldsymbol{\alpha}_\text{true}$ defined with respectively $\boldsymbol{m}_1$, $\boldsymbol{m}_2$, and $\boldsymbol{m}_3$. Columns from left to right: $\boldsymbol{\alpha}_\text{true}$ defined with respectively $s_1$, $s_2$, and $s_3$.
Figure 3: (Left) RSE versus lapsed time. (Right) Function $f$ minus the optimal value $f^*$ versus elapsed time. The loss is averaged over 1000 experiments where $\boldsymbol{\alpha}_{\text{true}}$ is uniformly sampled in $[0, 2]^{1000}$. The number of samples for each experiment is $M=500$.

Theorems & Definitions (22)

Definition 2.1: Bregman Divergence
Definition 2.2: Bregman Majorant Function
Remark 2.1
Example 2.1
Proposition 2.1
proof
Remark 2.2
Remark 2.3
Lemma 3.1: Three Points Identity
Lemma 3.2
...and 12 more

Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

TL;DR

Abstract

Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (22)