Risk Bounds for Mixture Density Estimation on Compact Domains via the $h$-Lifted Kullback--Leibler Divergence
Mark Chiu Chong, Hien Duy Nguyen, TrungTin Nguyen
TL;DR
This work introduces the $h$-lifted KL divergence $KL_h$ to generalize KL-based risk bounds for finite mixtures on compact domains, accommodating densities that may vanish. It defines the maximum $h$-lifted likelihood estimator (h-MLLE) and proves oracle-like risk bounds of the form $\mathbb{E}\{KL_h(f||f_{k,n})\}-KL_h(f||\mathcal{C}) \le c_1/(k+2) + c_2/\sqrt{n}$, with a dimension-safe, complexity-dependent term that can be controlled via covering numbers; under a Lipschitz condition the bound simplifies further. The $KL_h$ framework is shown to be a Bregman divergence, bounded for general continuous densities and related to $L_p$ distances, enabling robust analysis without strict positivity assumptions. The authors provide a practical MM algorithm to compute h-MLLEs and present beta-mixture experiments demonstrating the predicted rates and the elbow phenomenon, supporting the theory and showing practical viability for density estimation on compact domains. Overall, the paper offers a theoretically grounded and computation-friendly approach to mixture density estimation that extends classical KL-based results to broader density classes.
Abstract
We consider the problem of estimating probability density functions based on sample data, using a finite mixture of densities from some component class. To this end, we introduce the $h$-lifted Kullback--Leibler (KL) divergence as a generalization of the standard KL divergence and a criterion for conducting risk minimization. Under a compact support assumption, we prove an $\mathcal{O}(1/{\sqrt{n}})$ bound on the expected estimation error when using the $h$-lifted KL divergence, which extends the results of Rakhlin et al. (2005, ESAIM: Probability and Statistics, Vol. 9) and Li and Barron (1999, Advances in Neural Information ProcessingSystems, Vol. 12) to permit the risk bounding of density functions that are not strictly positive. We develop a procedure for the computation of the corresponding maximum $h$-lifted likelihood estimators ($h$-MLLEs) using the Majorization-Maximization framework and provide experimental results in support of our theoretical bounds.
