Table of Contents
Fetching ...

Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms

Andrzej Cichocki, Toshihisa Tanaka, Frank Nielsen, Sergio Cruces

TL;DR

The paper develops a unified information-geometric framework for Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) by leveraging trace-form entropies defined through deformed logarithms. By mapping MD/GEG updates to information-geometric structures, it shows how Tsallis, Kaniadakis, Schwämmle–Tsallis, and Kaniadakis–Lissia–Scarfone entropies induce distinct Fisher-like metrics, enabling adaptive, non-Euclidean geometries and robustness to gradient issues. The authors derive explicit MD/GEG update rules for these entropy families and establish a general normalization/geometry-preserving scheme that unifies additive and multiplicative gradient updates under a single framework. This information-geometric perspective provides principled tools for first-order optimization that adapt to data distributions, potentially improving convergence and stability in non-Euclidean settings.

Abstract

This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD \& GEG algorithms with improved convergence behavior, robustness to vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries through mirror maps. We establish deep connections between these methods and Amari's natural gradient, revealing a unified geometric foundation for additive, multiplicative, and natural gradient updates. Focusing on the Tsallis, Kaniadakis, Sharma--Taneja--Mittal, and Kaniadakis--Lissia--Scarfone entropy families, we show that each entropy induces a distinct Riemannian metric on the parameter space, leading to GEG algorithms that preserve the natural statistical geometry. The tunable parameters of deformed logarithms enable adaptive geometric selection, providing enhanced robustness and convergence over classical Euclidean optimization. Overall, our framework unifies key first-order MD optimization methods under a single information-geometric perspective based on generalized Bregman divergences, where the choice of entropy determines the underlying metric and dual geometric structure.

Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms

TL;DR

The paper develops a unified information-geometric framework for Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) by leveraging trace-form entropies defined through deformed logarithms. By mapping MD/GEG updates to information-geometric structures, it shows how Tsallis, Kaniadakis, Schwämmle–Tsallis, and Kaniadakis–Lissia–Scarfone entropies induce distinct Fisher-like metrics, enabling adaptive, non-Euclidean geometries and robustness to gradient issues. The authors derive explicit MD/GEG update rules for these entropy families and establish a general normalization/geometry-preserving scheme that unifies additive and multiplicative gradient updates under a single framework. This information-geometric perspective provides principled tools for first-order optimization that adapt to data distributions, potentially improving convergence and stability in non-Euclidean settings.

Abstract

This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD \& GEG algorithms with improved convergence behavior, robustness to vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries through mirror maps. We establish deep connections between these methods and Amari's natural gradient, revealing a unified geometric foundation for additive, multiplicative, and natural gradient updates. Focusing on the Tsallis, Kaniadakis, Sharma--Taneja--Mittal, and Kaniadakis--Lissia--Scarfone entropy families, we show that each entropy induces a distinct Riemannian metric on the parameter space, leading to GEG algorithms that preserve the natural statistical geometry. The tunable parameters of deformed logarithms enable adaptive geometric selection, providing enhanced robustness and convergence over classical Euclidean optimization. Overall, our framework unifies key first-order MD optimization methods under a single information-geometric perspective based on generalized Bregman divergences, where the choice of entropy determines the underlying metric and dual geometric structure.

Paper Structure

This paper contains 26 sections, 74 equations, 6 figures.

Figures (6)

  • Figure 1: Plots of the $q$-logarithm $\log_{q}(x)$ and $q$-exponential $\exp_{q}(x)$ functions for different values of parameter $q$. From the figure, one can observe how the $q$ parameter controls the degree of concavity of the $q$-logarithm as well as the degree of convexity of the $q$-exponential. Since the $q$-logarithm is convex for $q<0$, linear for $q=0$, and increasingly concave for $q>0$, particularizing to the classical logarithm for $q=1$. Therefore, the domain of the parameter $q$ should be limited to $(0,\infty)$ for the $q$-logarithm to satisfy the strict concavity property. Similarly, the $q$-exponential is concave for $q<0$, linear for $q=0$, and increasingly convex for $q>0$, particularizing to the classical exponential when $q=1$.
  • Figure 2: Plots of the $(q,q')$-logarithm and $(q,q')$-exponential functions for different values of the parameters when $q=q'$.
  • Figure 3: Plots of the $(q,q',r)$-logarithm and $(q,q',r)$-exponential functions when the parameters are coincident $q=q'=r$.
  • Figure 4: Plots of the $\kappa$-logarithm and $\kappa$-exponential functions for different values of the parameter $\kappa$.
  • Figure 5: Surface plots of the $(\kappa,r)$-logarithm for various values of hyperparameters $\kappa$ and $r$. The left-hand-side figure illustrate the $(\kappa,r)$-logarithm in terms of $\kappa$ and $x$ when $r=0$, therefore, it coincides with the $\kappa$-logarithm. The black continuous line represents the reference of the classical logarithm, which is obtained for $\kappa=0$. The right-hand-side figure illustrates the $(\kappa,r)$-logarithm, now in terms of $r$ and $x$, when $\kappa=0.5$. The black dashed line represents the $\kappa$-logarithm for $\kappa=0.5$.
  • ...and 1 more figures