Table of Contents
Fetching ...

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

Kazu Ghalamkari, Jesper Løve Hinrich, Morten Mørup

TL;DR

The paper addresses density estimation over discrete, tensor-structured data by optimizing the $\\alpha$-divergence between an empirical tensor and a mixture of low-rank tensor components. It introduces the E$^2$M algorithm, a double-upper-bound EM framework that relaxes $L_{\\alpha}$ to a KL-based surrogate and then applies a many-body approximation in the M-step to obtain closed-form updates for CP, Tucker, TT, and their mixtures, along with an adaptive background term. Theoretical guarantees include monotone convergence and exact MBA solutions for common tensor formats, enabling scalable, joint optimization across multiple low-rank structures. Empirical results on optimization, robustness to outliers via $\\alpha$, and real-data classification and density estimation demonstrate that E$^2$M with structured mixtures outperforms single-structure baselines and remains stable without gradient-based tuning. Overall, the framework provides a versatile, convergence-guaranteed approach for tensor-based density learning with flexible mixtures and robust handling of outliers.

Abstract

Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using $α$-divergence face analytical challenges due to the $α$-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E$^2$M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation

TL;DR

The paper addresses density estimation over discrete, tensor-structured data by optimizing the -divergence between an empirical tensor and a mixture of low-rank tensor components. It introduces the EM algorithm, a double-upper-bound EM framework that relaxes to a KL-based surrogate and then applies a many-body approximation in the M-step to obtain closed-form updates for CP, Tucker, TT, and their mixtures, along with an adaptive background term. Theoretical guarantees include monotone convergence and exact MBA solutions for common tensor formats, enabling scalable, joint optimization across multiple low-rank structures. Empirical results on optimization, robustness to outliers via , and real-data classification and density estimation demonstrate that EM with structured mixtures outperforms single-structure baselines and remains stable without gradient-based tuning. Overall, the framework provides a versatile, convergence-guaranteed approach for tensor-based density learning with flexible mixtures and robust handling of outliers.

Abstract

Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using -divergence face analytical challenges due to the -power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called EM algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.
Paper Structure (40 sections, 9 theorems, 80 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 40 sections, 9 theorems, 80 equations, 11 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

For any non-negative normalized tensor $\mathcal{T}$, any tensors $\mathcal{F}$ that satisfies $\sum_{\boldsymbol{i}\in\Omega_{I}}\mathcal{T}_{\boldsymbol{i}}^\alpha\mathcal{F}_{\boldsymbol{i}} =1$, and any real number $\alpha\in(0,1)$, the function $L_{\alpha}(\mathcal{P})$ in Equation eq:objective

Figures (11)

  • Figure 1: (a) A discrete density estimation by $N$ samples $\bm{x}^{(1)},\dots,\bm{x}^{(N)}$ for $\bm{x}^{(n)}=(x^{(n)}_1,x^{(n)}_2,x^{(n)}_3)$ and $x^{(n)}_d \in [I_d]$. The empirical distribution $p(\bm{x})$ is identical to a non-negative normalized tensor $\mathcal{T}$, and the true distribution is estimated by its low-rank approximation $\mathcal{P}$. (b) The E$^2$M algorithm includes two E-steps. The E1-step makes the upper bound tight w.r.t. the objective function, and the E2-step makes the upper bound of the upper bound tight w.r.t. the upper bound.
  • Figure 2: Interaction diagram for (a) $\mathcal{Q}_{ijkl}=\mathcal{A}_{ijk}B_{ij}C_{il}D_{jl}$, (b)$\mathcal{Q}^{[\mathrm{CP}]}$,(c)$\mathcal{Q}^{[\mathrm{Tucker}]}$, and (d) $\mathcal{Q}^{[\mathrm{TT}]}$.
  • Figure 3: Comparison of the number of iterations required to reconstruct a color image SIPI house. $^{\ref{['fot:SIPI']}}$
  • Figure 4: The leftmost panel shows the empirical distribution with outliers, and the subsequent panels show its reconstructions by E$^2$MCPTTB using different $\alpha$ values.
  • Figure 5: An example of a tensor tree structure represented by the tensor network.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Proposition 1
  • Remark 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 1
  • Theorem 2: Optimal M-step in CP decomposition huang2017kullback
  • Proposition 6: The optimal M-step in Tucker decomposition
  • Proposition 7: The optimal M-step in Train decomposition