Table of Contents
Fetching ...

Nonparametric Estimation of Joint Entropy via Partitioned Sample-Spacing

Jungwoo Ho, Sangun Park, Soyeong Oh

TL;DR

<3-5 sentence high-level summary>

Abstract

We propose a nonparametric estimator of multivariate joint entropy based on partitioned sample spacing (PSS). The method extends univariate spacing ideas to $\mathbb{R}^{d}$ by partitioning into localized cells and aggregating within-cell statistics, with strong consistency guarantees under mild conditions. In benchmarks across diverse distributions, PSS consistently outperforms $k$-nearest neighbor estimators and achieves accuracy competitive with recent normalizing flow-based methods, while requiring no training or auxiliary density modeling. The estimator scales favorably in moderately high dimensions ($d = 10$--$40$) and shows particular robustness to correlated or skewed distributions. These properties position PSS as a practical and reliable alternative to both $k$NN and NF-based entropy estimators, with broad utility in information-theoretic machine learning tasks such as total-correlation estimation, representation learning, and feature selection.

Nonparametric Estimation of Joint Entropy via Partitioned Sample-Spacing

TL;DR

<3-5 sentence high-level summary>

Abstract

We propose a nonparametric estimator of multivariate joint entropy based on partitioned sample spacing (PSS). The method extends univariate spacing ideas to by partitioning into localized cells and aggregating within-cell statistics, with strong consistency guarantees under mild conditions. In benchmarks across diverse distributions, PSS consistently outperforms -nearest neighbor estimators and achieves accuracy competitive with recent normalizing flow-based methods, while requiring no training or auxiliary density modeling. The estimator scales favorably in moderately high dimensions (--) and shows particular robustness to correlated or skewed distributions. These properties position PSS as a practical and reliable alternative to both NN and NF-based entropy estimators, with broad utility in information-theoretic machine learning tasks such as total-correlation estimation, representation learning, and feature selection.

Paper Structure

This paper contains 46 sections, 12 theorems, 85 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

If $m \rightarrow \infty$ and $m/n \rightarrow 0$ as $n \rightarrow \infty$, then $\xi_{i}-X_{(i)}$ converges to 0 almost surely.

Figures (6)

  • Figure 1: Comparison of density estimation for a bivariate normal with $\rho=0.8$. The direct method (left) fails to capture the correlation, while the partitioned method (right) successfully reflects the joint structure.
  • Figure 2: RMSE (a, c) and runtime (b, d) for entropy estimators under the Normal distribution with $\rho=0$. Panels (a, b): $d=10$, varying sample size $N$. Panels (c, d): $N=3000$, varying dimension.
  • Figure 3: RMSE (a, c) and runtime (b, d) for entropy estimators under the Gamma distribution with shape = 0.4, scale = 0.3. Panels (a, b): $d= 5$, varying sample size $N$. Panels (c, d): $\rho=0$, $N=30,000$, varying dimension.
  • Figure 4: RMSE (a, c) and runtime (b, d) for entropy estimators with varying correlation coefficient $\rho$. Panels (a, b): Normal distribution ($d=5, N=20,000$). Panels (c, d): Gamma distribution (shape $=0.4$, scale $=0.3$) with $d=7, N=50,000$.
  • Figure 5: Total Correlation values before ($\texttt{TC}_{\text{before}}$, hollow) and after ICA ($\texttt{TC}_{\text{after}}$, filled) for each estimator. Negative post-ICA TC values indicate estimator bias or variance, not true independence. PSS is the only estimator reporting a stable, low, and nonnegative $\texttt{TC}_{\text{after}}$, aligning with theoretical expectations.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Lemma 1: Consistency of $\xi_{i}$
  • Theorem 2: Consistency of $\hat{f}_n(x)$
  • Proposition 3
  • Lemma 4
  • Theorem 5: Consistency of $\hat{f}_{n,\ell}(x,y)$
  • Corollary 6: $L^1$ convergence of $\hat{f}_{n,\ell}$
  • Proposition 7
  • Theorem 8
  • Proposition 9
  • Theorem 10: Consistency of the Partitioned Multivariate Density Estimator
  • ...and 2 more