Table of Contents
Fetching ...

Information Theory for Expectation Measures

Peter Harremoës

TL;DR

The paper develops an information-divergence framework for expectation measures, enabling analysis of nonrandom data and point-processes within a unified coding perspective. It derives explicit optimal coding rules for empirical texts, notably showing $\ell^*(a)=\ln\left(\frac{\|\mu^*\|}{\mu^*(a)}\right)$ with $\mu^*$ maximizing entropy over convex hulls of empirical measures, thereby connecting minimal description length to maximum-entropy principles. A Poisson interpretation then links $D(\mu||\nu)$ to divergences between Poisson processes $Po(\mu)$ and $Po(\nu)$, yielding a chain rule that separates sample-size uncertainty from letter-uncertainty and enabling information projection and reverse information projection analyses. Overall, the work ties Kraft’s inequality, MDL, and scoring rules to a generalized theory of information for expectation measures, with implications for nonstochastic data modeling and point-process applications.

Abstract

Shannon based his information theory on the notion of probability measures as it we developed by Kolmogorov. In this paper we study some fundamental problems in information theory based on expectation measures. In the theory of expectation measures it is natural to study data sets where no randomness is present and it is also natural to study information theory for point processes as well as sampling where the sample size is not fixed. Expectation measures in combination with Kraft's Inequality can be used to clarify in which cases probability measures can be used to quantify randomness.

Information Theory for Expectation Measures

TL;DR

The paper develops an information-divergence framework for expectation measures, enabling analysis of nonrandom data and point-processes within a unified coding perspective. It derives explicit optimal coding rules for empirical texts, notably showing with maximizing entropy over convex hulls of empirical measures, thereby connecting minimal description length to maximum-entropy principles. A Poisson interpretation then links to divergences between Poisson processes and , yielding a chain rule that separates sample-size uncertainty from letter-uncertainty and enabling information projection and reverse information projection analyses. Overall, the work ties Kraft’s inequality, MDL, and scoring rules to a generalized theory of information for expectation measures, with implications for nonstochastic data modeling and point-process applications.

Abstract

Shannon based his information theory on the notion of probability measures as it we developed by Kolmogorov. In this paper we study some fundamental problems in information theory based on expectation measures. In the theory of expectation measures it is natural to study data sets where no randomness is present and it is also natural to study information theory for point processes as well as sampling where the sample size is not fixed. Expectation measures in combination with Kraft's Inequality can be used to clarify in which cases probability measures can be used to quantify randomness.

Paper Structure

This paper contains 10 sections, 11 theorems, 41 equations, 2 figures.

Key Result

Proposition 3

Let $\mu$ and $\nu$ denote measures on $\left(\mathbb{A},\mathcal{G}\right),$ and assume that $D\left(\mu\Vert\nu\right)<\infty.$ Then there exists $B\in\mathcal{G}$ such that $\mu\left(\cdot\cap B\right)$ and $\nu\left(\cdot\cap B\right)$ are $\sigma$-finite and such that $\mu\left(\cdot\cap\comple

Figures (2)

  • Figure 1: At the maximum entropy measure the level curve of the entropy function is tangent to that face that contain the maximum entropy measure.
  • Figure 2: At a measure that does not have maximum entropy the level curve of the entropy function is not tangent to that face that contain the maximum entropy measure.

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Proposition 3
  • Definition 4
  • Theorem 5
  • Definition 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • Theorem 10
  • ...and 5 more