E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

Kazu Ghalamkari; Jesper Løve Hinrich; Morten Mørup

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

Kazu Ghalamkari, Jesper Løve Hinrich, Morten Mørup

TL;DR

The paper addresses density estimation over discrete, tensor-structured data by optimizing the $\\alpha$-divergence between an empirical tensor and a mixture of low-rank tensor components. It introduces the E$^2$M algorithm, a double-upper-bound EM framework that relaxes $L_{\\alpha}$ to a KL-based surrogate and then applies a many-body approximation in the M-step to obtain closed-form updates for CP, Tucker, TT, and their mixtures, along with an adaptive background term. Theoretical guarantees include monotone convergence and exact MBA solutions for common tensor formats, enabling scalable, joint optimization across multiple low-rank structures. Empirical results on optimization, robustness to outliers via $\\alpha$, and real-data classification and density estimation demonstrate that E$^2$M with structured mixtures outperforms single-structure baselines and remains stable without gradient-based tuning. Overall, the framework provides a versatile, convergence-guaranteed approach for tensor-based density learning with flexible mixtures and robust handling of outliers.

Abstract

Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using $α$-divergence face analytical challenges due to the $α$-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E$^2$M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

TL;DR

The paper addresses density estimation over discrete, tensor-structured data by optimizing the

-divergence between an empirical tensor and a mixture of low-rank tensor components. It introduces the E

M algorithm, a double-upper-bound EM framework that relaxes

to a KL-based surrogate and then applies a many-body approximation in the M-step to obtain closed-form updates for CP, Tucker, TT, and their mixtures, along with an adaptive background term. Theoretical guarantees include monotone convergence and exact MBA solutions for common tensor formats, enabling scalable, joint optimization across multiple low-rank structures. Empirical results on optimization, robustness to outliers via

, and real-data classification and density estimation demonstrate that E

M with structured mixtures outperforms single-structure baselines and remains stable without gradient-based tuning. Overall, the framework provides a versatile, convergence-guaranteed approach for tensor-based density learning with flexible mixtures and robust handling of outliers.

Abstract

Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using

-divergence face analytical challenges due to the

-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E

M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.

Paper Structure (40 sections, 9 theorems, 80 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 40 sections, 9 theorems, 80 equations, 11 figures, 4 tables, 2 algorithms.

Introduction
Related work
Problem setup
Tensor-based density estimation via alpha-divergence optimization
Two upper bounds of the alpha-divergence deriving E2M algorithm
E2M-algorithm for mixture of low-rank tensors
Many-body approximation meets E2M algorithm
Many-body approximation with exact solution
Scalability of the E2M-algorithm for tensor learning with closed-form updates
Numerical Experiments
Discussion and Conclusion
Proofs
Proofs for E2M-algorithm
Notation and definition
Proofs for exact solutions of many-body approximation
...and 25 more sections

Key Result

Proposition 1

For any non-negative normalized tensor $\mathcal{T}$, any tensors $\mathcal{F}$ that satisfies $\sum_{\boldsymbol{i}\in\Omega_{I}}\mathcal{T}_{\boldsymbol{i}}^\alpha\mathcal{F}_{\boldsymbol{i}} =1$, and any real number $\alpha\in(0,1)$, the function $L_{\alpha}(\mathcal{P})$ in Equation eq:objective

Figures (11)

Figure 1: (a) A discrete density estimation by $N$ samples $\bm{x}^{(1)},\dots,\bm{x}^{(N)}$ for $\bm{x}^{(n)}=(x^{(n)}_1,x^{(n)}_2,x^{(n)}_3)$ and $x^{(n)}_d \in [I_d]$. The empirical distribution $p(\bm{x})$ is identical to a non-negative normalized tensor $\mathcal{T}$, and the true distribution is estimated by its low-rank approximation $\mathcal{P}$. (b) The E$^2$M algorithm includes two E-steps. The E1-step makes the upper bound tight w.r.t. the objective function, and the E2-step makes the upper bound of the upper bound tight w.r.t. the upper bound.
Figure 2: Interaction diagram for (a) $\mathcal{Q}_{ijkl}=\mathcal{A}_{ijk}B_{ij}C_{il}D_{jl}$, (b)$\mathcal{Q}^{[\mathrm{CP}]}$,(c)$\mathcal{Q}^{[\mathrm{Tucker}]}$, and (d) $\mathcal{Q}^{[\mathrm{TT}]}$.
Figure 3: Comparison of the number of iterations required to reconstruct a color image SIPI house. $^{\ref{['fot:SIPI']}}$
Figure 4: The leftmost panel shows the empirical distribution with outliers, and the subsequent panels show its reconstructions by E$^2$MCPTTB using different $\alpha$ values.
Figure 5: An example of a tensor tree structure represented by the tensor network.
...and 6 more figures

Theorems & Definitions (10)

Proposition 1
Remark 1
Proposition 2
Proposition 3
Proposition 4
Proposition 5
Theorem 1
Theorem 2: Optimal M-step in CP decomposition huang2017kullback
Proposition 6: The optimal M-step in Tucker decomposition
Proposition 7: The optimal M-step in Train decomposition

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

TL;DR

Abstract

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)