E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation
Kazu Ghalamkari, Jesper Løve Hinrich, Morten Mørup
TL;DR
The paper addresses density estimation over discrete, tensor-structured data by optimizing the $\\alpha$-divergence between an empirical tensor and a mixture of low-rank tensor components. It introduces the E$^2$M algorithm, a double-upper-bound EM framework that relaxes $L_{\\alpha}$ to a KL-based surrogate and then applies a many-body approximation in the M-step to obtain closed-form updates for CP, Tucker, TT, and their mixtures, along with an adaptive background term. Theoretical guarantees include monotone convergence and exact MBA solutions for common tensor formats, enabling scalable, joint optimization across multiple low-rank structures. Empirical results on optimization, robustness to outliers via $\\alpha$, and real-data classification and density estimation demonstrate that E$^2$M with structured mixtures outperforms single-structure baselines and remains stable without gradient-based tuning. Overall, the framework provides a versatile, convergence-guaranteed approach for tensor-based density learning with flexible mixtures and robust handling of outliers.
Abstract
Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using $α$-divergence face analytical challenges due to the $α$-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E$^2$M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.
