Table of Contents
Fetching ...

The Principle of Uncertain Maximum Entropy

Kenneth Bogert, Matthew Kothe

TL;DR

This work extends the classical maximum entropy framework to settings with uncertain empirical information induced by a memoryless channel. It introduces Uncertain Maximum Entropy (uMaxEnt), a convex, hierarchical optimization that jointly considers channel constraints and feature-based structure, selecting the most entropic among feasible solutions to bound the unknown distribution's entropy and quantify information loss. The approach generalizes prior notions like Latent Maximum Entropy and is validated through experiments, including multi-channel configurations and finite-sample approximations, demonstrating improved accuracy over traditional Max Entropy in many regimes. The results offer a principled interpretation of entropy under communication-induced uncertainty and provide practical algorithms and bounds for robust distribution estimation in noisy settings.

Abstract

The Principle of Maximum Entropy is a rigorous technique for estimating an unknown distribution given partial information while simultaneously minimizing bias. However, an important requirement for applying the principle is that the available information be provided error-free (Jaynes 1982). We relax this requirement using a memoryless communication channel as a framework to derive a new, more general principle. We show our new principle provides an upper bound on the entropy of the unknown distribution and the amount of information lost due to the use of a given communications channel is unknown unless the unknown distribution's entropy is also known. Using our new principle we provide a new interpretation of the classic principle and experimentally show its performance relative to the classic principle and other generally applicable solutions. Finally, we present a simple algorithm for solving our new principle and an approximation useful when samples are limited.

The Principle of Uncertain Maximum Entropy

TL;DR

This work extends the classical maximum entropy framework to settings with uncertain empirical information induced by a memoryless channel. It introduces Uncertain Maximum Entropy (uMaxEnt), a convex, hierarchical optimization that jointly considers channel constraints and feature-based structure, selecting the most entropic among feasible solutions to bound the unknown distribution's entropy and quantify information loss. The approach generalizes prior notions like Latent Maximum Entropy and is validated through experiments, including multi-channel configurations and finite-sample approximations, demonstrating improved accuracy over traditional Max Entropy in many regimes. The results offer a principled interpretation of entropy under communication-induced uncertainty and provide practical algorithms and bounds for robust distribution estimation in noisy settings.

Abstract

The Principle of Maximum Entropy is a rigorous technique for estimating an unknown distribution given partial information while simultaneously minimizing bias. However, an important requirement for applying the principle is that the available information be provided error-free (Jaynes 1982). We relax this requirement using a memoryless communication channel as a framework to derive a new, more general principle. We show our new principle provides an upper bound on the entropy of the unknown distribution and the amount of information lost due to the use of a given communications channel is unknown unless the unknown distribution's entropy is also known. Using our new principle we provide a new interpretation of the classic principle and experimentally show its performance relative to the classic principle and other generally applicable solutions. Finally, we present a simple algorithm for solving our new principle and an approximation useful when samples are limited.
Paper Structure (18 sections, 6 theorems, 13 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 6 theorems, 13 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

The satisfying set of the program eq:u_max_ent_program is convex.

Figures (7)

  • Figure 1: A discrete, memoryless communications channel showing the process of transmitting a message $w$ (top) and receiving (bottom)
  • Figure 2: $P(W)$ probability simplex for the system described above and possible solutions: program \ref{['eq:max_ent_emppw']} (green solid dot), applying program \ref{['eq:max_ent']} to the results of program \ref{['eq:max_ent_emppw']} (solid blue dot), two possible solutions using equation \ref{['eqn:ml']} and a uniform prior (triangles) or using the true distribution as the prior (darker triangle), and the distribution that satisfies all constraints (unfilled blue circle). For reference, the distribution with the maximum possible entropy is marked with a star, solid blue vertical line is all possible maximum entropy distributions given $\phi$, and the dashed green line is all solutions to equation \ref{['eqn:empPY']}.
  • Figure 3: Box plot of the $D_{KL}$ achieved by uMaxEnt, dMaxEnt, and Multi-Channel uMaxEnt for two channels as $|\phi|$ is varied. Points mark the respective mean, median is the line.
  • Figure 4: Box plot of the $D_{KL}$ achieved by uMaxEnt, dMaxEnt, and Multi-Channel uMaxEnt for two channels as $|\mathcal{Y}|$ is varied. Points mark the respective mean, median is the line.
  • Figure 5: Scatter plot of the $D_{KL}$ achieved by multi-channel uMaxEnt against single-channel uMaxEnt, 20,000 points randomly selected from our data set (where $Y \neq W$). Note log scales on X and Y axes.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Corollary 2.1
  • Corollary 2.2
  • Theorem 3
  • Theorem 4