On Semi-supervised Estimation of Discrete Distributions under f-divergences

Hasan Sabri Melihcan Erol; Lizhong Zheng

On Semi-supervised Estimation of Discrete Distributions under f-divergences

Hasan Sabri Melihcan Erol, Lizhong Zheng

TL;DR

The paper addresses semi-supervised estimation of the joint distribution $p_{XY}$ from mixed labeled and unlabeled data under minimax risk. It shows that composing univariate minimax estimators preserves optimal first-order risk for $1 \le p \le 2$ in $l^p_p$ losses and extends these results to a broad family of $f$-divergences, including KL, chi-square, Squared Hellinger, and Le Cam. The authors derive explicit rates and constants, such as $R^p_{m,n} = (|\mathcal X|)^{1- p/2} C_p m^{-p/2}$ and $R^f_{n,m} = |\mathcal X| C_f / m$, and prove minimax optimality of the composition estimators in the semi-supervised setting. These results provide rigorous guarantees for discrete pmf estimation when unlabeled data are abundant and labeling is costly, across multiple divergence criteria. Overall, the work advances theoretical understanding of semi-supervised minimax estimation for discrete distributions.

Abstract

We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk with the optimal first-order constant for $p \ge 2$ in the regime $m = o(n)$, questions remained for $p \le 2$ and various $f$-divergences. In our study, we affirm that these composite estimators are indeed minimax optimal for $l^p_p$ loss functions, specifically for the range $1 \le p \le 2$, including the critical $l_1$ loss. Additionally, we ascertain their optimality for a suite of $f$-divergences, such as KL, $χ^2$, Squared Hellinger, and Le Cam divergences.

On Semi-supervised Estimation of Discrete Distributions under f-divergences

TL;DR

The paper addresses semi-supervised estimation of the joint distribution

from mixed labeled and unlabeled data under minimax risk. It shows that composing univariate minimax estimators preserves optimal first-order risk for

losses and extends these results to a broad family of

-divergences, including KL, chi-square, Squared Hellinger, and Le Cam. The authors derive explicit rates and constants, such as

and

, and prove minimax optimality of the composition estimators in the semi-supervised setting. These results provide rigorous guarantees for discrete pmf estimation when unlabeled data are abundant and labeling is costly, across multiple divergence criteria. Overall, the work advances theoretical understanding of semi-supervised minimax estimation for discrete distributions.

Abstract

We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of

samples containing both variables and

samples missing one fixed variable. We adopt the minimax framework with

loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk with the optimal first-order constant for

in the regime

, questions remained for

and various

-divergences. In our study, we affirm that these composite estimators are indeed minimax optimal for

loss functions, specifically for the range

, including the critical

loss. Additionally, we ascertain their optimality for a suite of

-divergences, such as KL,

, Squared Hellinger, and Le Cam divergences.

Paper Structure (17 sections, 13 theorems, 26 equations)

This paper contains 17 sections, 13 theorems, 26 equations.

Introduction
Notations & Preliminaries
$l^p_p$ losses for $1 \le p < 2$
$f$-divergences
Results
$l^p_p$ loss functions
f-divergences
Proofs for Theorems
$l^p_p$ loss functions
$f$-divergences
$\xi^{U_n, L_m}_{y,x} = p_X(x)$
$\xi^{U_n, L_m}_{y,x} =\check{q}_X(x) = \frac{1}{\left|{\mathcal{X}}\right|}$
Supplementary Results
Conclusion
Supplementary Results (Cont'd)
...and 2 more sections

Key Result

Theorem 1

Let $\hat{q}^*_{n}$ be a minimax optimal estimator for $r^p_n$. Then the conditional composition $\hat{q}^{*,m}_{Y\mid X}$ based on $\hat{q}^{*}_n$ is minimax optimal for $\bar{R}^p_m$:

Theorems & Definitions (13)

Theorem 1: Theorem 1 of onsemsup
Theorem 2
Theorem 3: Theorem 3 of onsemsup
Theorem 4
Theorem 5
Corollary 1
Theorem 6
Theorem 7
Lemma 1
Lemma 2
...and 3 more

On Semi-supervised Estimation of Discrete Distributions under f-divergences

TL;DR

Abstract

On Semi-supervised Estimation of Discrete Distributions under f-divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (13)