Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms
Andrzej Cichocki, Toshihisa Tanaka, Frank Nielsen, Sergio Cruces
TL;DR
The paper develops a unified information-geometric framework for Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) by leveraging trace-form entropies defined through deformed logarithms. By mapping MD/GEG updates to information-geometric structures, it shows how Tsallis, Kaniadakis, Schwämmle–Tsallis, and Kaniadakis–Lissia–Scarfone entropies induce distinct Fisher-like metrics, enabling adaptive, non-Euclidean geometries and robustness to gradient issues. The authors derive explicit MD/GEG update rules for these entropy families and establish a general normalization/geometry-preserving scheme that unifies additive and multiplicative gradient updates under a single framework. This information-geometric perspective provides principled tools for first-order optimization that adapt to data distributions, potentially improving convergence and stability in non-Euclidean settings.
Abstract
This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD \& GEG algorithms with improved convergence behavior, robustness to vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries through mirror maps. We establish deep connections between these methods and Amari's natural gradient, revealing a unified geometric foundation for additive, multiplicative, and natural gradient updates. Focusing on the Tsallis, Kaniadakis, Sharma--Taneja--Mittal, and Kaniadakis--Lissia--Scarfone entropy families, we show that each entropy induces a distinct Riemannian metric on the parameter space, leading to GEG algorithms that preserve the natural statistical geometry. The tunable parameters of deformed logarithms enable adaptive geometric selection, providing enhanced robustness and convergence over classical Euclidean optimization. Overall, our framework unifies key first-order MD optimization methods under a single information-geometric perspective based on generalized Bregman divergences, where the choice of entropy determines the underlying metric and dual geometric structure.
