Table of Contents
Fetching ...

Memory Complexity of Estimating Entropy and Mutual Information

Tomer Berg, Or Ordentlich, Ofer Shayevitz

TL;DR

This work analyzes the memory requirements for estimating entropy and mutual information from an online data stream under a finite-state memory constraint. It introduces a memory-efficient estimator that combines Morris counters with a finite-state bias estimator to achieve an additive entropy error ε using S^*(n,ε,δ) states, and proves matching lower bounds via reductions to uniformity testing. The main results prove S^*(n,ε,δ) = O(n (log n)^4/(ε^2 δ)) for moderate ε and S^*(n,ε,δ) = Ω(max{n, log n/ε}) for larger ε, with a parallel set of bounds for mutual information estimation that scale as nm up to polylog factors. These findings quantify fundamental space-sample trade-offs in distribution-property estimation and propose practical, memory-efficient algorithms for entropy and MI estimation in streaming settings.

Abstract

We observe an infinite sequence of independent identically distributed random variables $X_1,X_2,\ldots$ drawn from an unknown distribution $p$ over $[n]$, and our goal is to estimate the entropy $H(p)=-\mathbb{E}[\log p(X)]$ within an $\varepsilon$-additive error. To that end, at each time point we are allowed to update a finite-state machine with $S$ states, using a possibly randomized but time-invariant rule, where each state of the machine is assigned an entropy estimate. Our goal is to characterize the minimax memory complexity $S^*$ of this problem, which is the minimal number of states for which the estimation task is feasible with probability at least $1-δ$ asymptotically, uniformly in $p$. Specifically, we show that there exist universal constants $C_1$ and $C_2$ such that $ S^* \leq C_1\cdot\frac{n (\log n)^4}{\varepsilon^2δ}$ for $\varepsilon$ not too small, and $S^* \geq C_2 \cdot \max \{n, \frac{\log n}{\varepsilon}\}$ for $\varepsilon$ not too large. The upper bound is proved using approximate counting to estimate the logarithm of $p$, and a finite memory bias estimation machine to estimate the expectation operation. The lower bound is proved via a reduction of entropy estimation to uniformity testing. We also apply these results to derive bounds on the memory complexity of mutual information estimation.

Memory Complexity of Estimating Entropy and Mutual Information

TL;DR

This work analyzes the memory requirements for estimating entropy and mutual information from an online data stream under a finite-state memory constraint. It introduces a memory-efficient estimator that combines Morris counters with a finite-state bias estimator to achieve an additive entropy error ε using S^*(n,ε,δ) states, and proves matching lower bounds via reductions to uniformity testing. The main results prove S^*(n,ε,δ) = O(n (log n)^4/(ε^2 δ)) for moderate ε and S^*(n,ε,δ) = Ω(max{n, log n/ε}) for larger ε, with a parallel set of bounds for mutual information estimation that scale as nm up to polylog factors. These findings quantify fundamental space-sample trade-offs in distribution-property estimation and propose practical, memory-efficient algorithms for entropy and MI estimation in streaming settings.

Abstract

We observe an infinite sequence of independent identically distributed random variables drawn from an unknown distribution over , and our goal is to estimate the entropy within an -additive error. To that end, at each time point we are allowed to update a finite-state machine with states, using a possibly randomized but time-invariant rule, where each state of the machine is assigned an entropy estimate. Our goal is to characterize the minimax memory complexity of this problem, which is the minimal number of states for which the estimation task is feasible with probability at least asymptotically, uniformly in . Specifically, we show that there exist universal constants and such that for not too small, and for not too large. The upper bound is proved using approximate counting to estimate the logarithm of , and a finite memory bias estimation machine to estimate the expectation operation. The lower bound is proved via a reduction of entropy estimation to uniformity testing. We also apply these results to derive bounds on the memory complexity of mutual information estimation.
Paper Structure (22 sections, 19 theorems, 100 equations, 2 figures, 4 algorithms)

This paper contains 22 sections, 19 theorems, 100 equations, 2 figures, 4 algorithms.

Key Result

Theorem 1

For any $c>1$, $\beta>0$, $0<\delta<1$ and $\varepsilon=10^{-5}+\beta+\psi_c(n)$, we have where and we set $C= 2(e+1) 10^8$ and $v_n(\alpha)\triangleq\sqrt{\frac{2c\alpha^3}{\log n}}+\frac{\alpha}{\log n}$. Moreover, there is an algorithm that attains eq:upb when the number of samples is $\Omega \left(\frac{n^c\cdot \mathop{\mathrm{poly}}(\log n)}{\delta}\cdot \mathop{\mathrm{poly}}(\log (1/\d

Figures (2)

  • Figure 1: The original Morris counter
  • Figure 2: Randomized bias estimation machine ($q=1-p$)

Theorems & Definitions (31)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4: flajolet1985approximate
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 21 more