Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies

Guglielmo Beretta; Tommaso Cesari; Roberto Colomboni

Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies

Guglielmo Beretta, Tommaso Cesari, Roberto Colomboni

TL;DR

This work derives a sharp Pinsker-type inequality for the Bregman divergences $D_\alpha(p\|q)$ generated by the negative $\alpha$-Tsallis entropies on the probability simplex. It provides explicit optimal constants $C_{\alpha,K}$ across regimes that depend on $\alpha$ and the dimension $K$, revealing dimension-free behavior for $\alpha\le1$ and phase transitions at $\alpha=2$ with parity effects in $K$. For multiclass problems with $K\ge3$, a uniform bound exists only when $\alpha\le2$, while in the binary case $K=2$ a Pinsker-type bound holds for all $\alpha>2$. The results connect Tsallis losses to $\beta$-divergences, recover the classical KL Pinsker bound at $\alpha=1$, and yield the $L_1$-strong convexity constant of $-S_\alpha$, providing practical implications for online learning and surrogate-to-0-1 regret analyses in multiclass settings.

Abstract

The Pinsker inequality lower bounds the Kullback--Leibler divergence $D_{\textrm{KL}}$ in terms of total variation and provides a canonical way to convert $D_{\textrm{KL}}$ control into $\lVert \cdot \rVert_1$-control. Motivated by applications to probabilistic prediction with Tsallis losses and online learning, we establish a generalized Pinsker inequality for the Bregman divergences $D_α$ generated by the negative $α$-Tsallis entropies -- also known as $β$-divergences. Specifically, for any $p$, $q$ in the relative interior of the probability simplex $Δ^K$, we prove the sharp bound \[ D_α(p\Vert q) \ge \frac{C_{α,K}}{2}\cdot \|p-q\|_1^2, \] and we determine the optimal constant $C_{α,K}$ explicitly for every choice of $(α,K)$.

Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies

TL;DR

This work derives a sharp Pinsker-type inequality for the Bregman divergences

generated by the negative

-Tsallis entropies on the probability simplex. It provides explicit optimal constants

across regimes that depend on

and the dimension

, revealing dimension-free behavior for

and phase transitions at

with parity effects in

. For multiclass problems with

, a uniform bound exists only when

, while in the binary case

a Pinsker-type bound holds for all

. The results connect Tsallis losses to

-divergences, recover the classical KL Pinsker bound at

, and yield the

-strong convexity constant of

, providing practical implications for online learning and surrogate-to-0-1 regret analyses in multiclass settings.

Abstract

The Pinsker inequality lower bounds the Kullback--Leibler divergence

in terms of total variation and provides a canonical way to convert

control into

-control. Motivated by applications to probabilistic prediction with Tsallis losses and online learning, we establish a generalized Pinsker inequality for the Bregman divergences

generated by the negative

-Tsallis entropies -- also known as

-divergences. Specifically, for any

in the relative interior of the probability simplex

, we prove the sharp bound

and we determine the optimal constant

explicitly for every choice of

Paper Structure (15 sections, 10 theorems, 118 equations, 1 figure, 1 table)

This paper contains 15 sections, 10 theorems, 118 equations, 1 figure, 1 table.

Introduction and Related Works
Other Generalized Pinsker Inequalities.
Pinsker Inequality for Bregman Divergences of Tsallis Entropies
Geometric Intuition Behind the Proof of Theorem \ref{['thm:pinsker']}
Toward the Proof: Useful Results
The Proof
Conclusion
Pinsker Inequality: Total Variation vs. L1 Norm
Tsallis Losses and Their Bayes and Excess Risk
$0$--$1$ Loss Regret is Controlled by $\left\lVert\cdot\right\rVert_1$ distance
Proof of the Remarks
Proof of Remark \ref{['rem:sigma']}: Estimating the Constant $\sigma_{\alpha,K}$
Nonexistence of Pinsker-Type Inequality for alpha > 2 and K >= 3
Pinsker Inequality for Csiszár f-Divergences
Weaker Inequalities for alpha > 2 and K >= 3

Key Result

Theorem 8

Let $\alpha \in \mathbb{R}$ and denote by $C_{\alpha, K}$ the largest $C \geq 0$ such that for every $p$, $q \in \mathop{\mathrm{relint}}\limits\left(\Delta^{K}\right)$ Then, where $\sigma_{\alpha, K} \coloneq \left( \frac{ (1-\frac{1}{K})^{\frac{1 - \alpha}{3- \alpha}} + (1+\frac{1}{K})^{\frac{1 - \alpha}{3- \alpha}} }{2} \right)^{3 - \alpha}$ (see Remark rem:sigma for more about this term).

Figures (1)

Figure 1: Sharp Pinsker constants $C_{\alpha,K}$ for selected $K$. Dashed lines mark $\alpha=1,2,3$ where the phase transitions occur.

Theorems & Definitions (41)

Definition 1: Tsallis entropies
Remark 2
Definition 3: Bregman divergences
Remark 4
Remark 5
Definition 6: $\beta$-divergences
Remark 7
Theorem 8: Pinsker Inequality for Bregman divergences of Tsallis Entropies
Corollary 9
Remark 10
...and 31 more

Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies

TL;DR

Abstract

Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (41)