On the Structure of Information
Sebastian Gottwald, Daniel A. Braun
TL;DR
The general treatment reveals how information can be decomposed into redundant, unique, and synergistic contributions, a question important in applications from neuroscience to machine learning, yet one for which existing formulations lack consensus on foundational definitions and can violate basic properties such as the chain rule or non-negativity.
Abstract
We characterize information as risk reduction between knowledge states represented by partitions of the underlying probability space. Entropy corresponds to risk reduction from no (or partial) knowledge to full knowledge about a random variable, while information corresponds to risk reduction from no (or partial) knowledge to partial knowledge. This applies to any information measure that is based on expected loss minimization, such as Bregman information, with Shannon information and variance as prominent examples. In each case, fundamental properties like the chain rule, non-negativity, and the relationship between information and divergence are preserved. Because partitions form a lattice under refinement, our general treatment reveals how information can be decomposed into redundant, unique, and synergistic contributions, a question important in applications from neuroscience to machine learning, yet one for which existing formulations lack consensus on foundational definitions and can violate basic properties such as the chain rule or non-negativity. Redundancy corresponds to Aumann's common knowledge, synergy to the gap between separately and jointly observed sources, and unique information is necessarily path-dependent, taking different values depending on what is already known. The resulting partial information decomposition is grounded directly in probability theory, avoids treating scalar information quantities as primitive compositional objects, and yields non-negative terms by construction.
