Partial information decomposition: redundancy as information bottleneck

Artemy Kolchinsky

Partial information decomposition: redundancy as information bottleneck

Artemy Kolchinsky

TL;DR

The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target and can be formulated as a type of information bottleneck problem, termed the “redundancy bottleneck” (RB).

Abstract

The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target. Here, we show that this goal can be formulated as a type of information bottleneck (IB) problem, termed the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that best predict the target, without revealing which source provided the information. It can be understood as a generalization of "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction--compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.

Partial information decomposition: redundancy as information bottleneck

TL;DR

Abstract

Paper Structure (13 sections, 4 theorems, 80 equations, 4 figures)

This paper contains 13 sections, 4 theorems, 80 equations, 4 figures.

Introduction
Background
Information Bottleneck (IB)
Partial Information Decomposition
Blackwell Redundancy
Redundancy Bottleneck
Reformulation of Blackwell Redundancy
Redundancy Bottleneck
Contributions From Different Sources
Examples
Continuity
Iterative Algorithm
Discussion

Key Result

Theorem 1

Blackwell redundancy (eq:opt1) can be expressed as

Figures (4)

Figure 1: RB analysis for the UNIQUE gate (Example 1). (a) Prediction values found by optimizing the RB Lagrangian (\ref{['eq:opt3']}) at different $\beta$. Colored regions indicate contributions from different sources, $\nu_{S}(s)I(Q;Y\vert S=s)$ from Eq. (\ref{['eq:decomp0-1']}). For this system, only source $X_{1}$ contributes to the prediction. (b) Compression costs found by optimizing the RB Lagrangian at different $\beta$. Colored regions indicate contributions from different sources, $\nu_{S}(s)I(Q;S=s\vert Y)$ from Eq. (\ref{['eq:decompC']}). (c) The RB curve shows the tradeoff between optimal compression and the prediction values; the marker colors correspond to the $\beta$ values as in (a) and (b). All bottleneck variables $Q$ must fall within the accessible grey region. (d) RB curves for individual sources.
Figure 2: RB analysis for the system with 4 binary symmetric channels (Example 3). (a) and (b) Prediction and compression values found by optimizing the RB Lagrangian (\ref{['eq:opt3']}) at different $\beta$. Contributions from individual sources are shown as shaded regions. (c) The RB curve shows the tradeoff between optimal compression and prediction values; marker colors correspond to the $\beta$ values as in (a) and (b). (d) RB curves for individual sources.
Figure 3: RB analysis for the system with a 3-spin target (Example 4). (a) and (b) Prediction and compression values found by optimizing the RB Lagrangian (\ref{['eq:opt3']}) at different $\beta$. Contributions from individual sources are shown as shaded regions. (c) The RB curve shows the tradeoff between optimal compression and prediction values; marker colors correspond to the $\beta$ values as in (a) and (b). (d) RB curves for individual sources.
Figure 4: The RB function $I_{\text{RB}}(R)$ is continuous in the underlying probability distribution for $R>0$, while Blackwell redundancy can be discontinuous. Here illustrated on the COPY gate, $Y=(X_{1},X_{2})$, as a function of correlation strength $\epsilon$ between $X_{1}$ and $X_{2}$ (perfect correlation at $\epsilon=0$, independence at $\epsilon=1$). Blackwell redundancy jumps from $I_{\cap}=1$ at $\epsilon=0$ to $I_{\cap}=0$ at $\epsilon>0$, while $I_{\text{RB}}(R)$ (at $R=0.01$) decays continuously.

Theorems & Definitions (8)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
proof
proof
proof
proof

Partial information decomposition: redundancy as information bottleneck

TL;DR

Abstract

Partial information decomposition: redundancy as information bottleneck

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)