An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

Yi Song; Kai Wan; Zhenyu Liao; Giuseppe Caire

An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

Yi Song, Kai Wan, Zhenyu Liao, Giuseppe Caire

TL;DR

This work addresses an information bottleneck problem arising in a remote source coding setup where a binary source $Y$ is observed through a Gaussian mixture $X$, and an intermediate node compresses the observation into $T$ under a rate constraint $I(X; T) \le R$ to maximize $I(Y; T)$ under log-loss. It proposes three analytically tractable IB schemes—two-level random quantization, multi-level deterministic quantization, and soft quantization using $\tanh$—as achievable solutions, with BA serving as a numerical optimum benchmark and information dropout as a competing approach. The results show that the proposed schemes achieve near-optimal performance across SNRs, outperform the information dropout baseline, and extend to vector mixture Gaussian observations, with applications to binary classification under information leakage and MNIST-based validation. The paper also connects IB to remote source coding and demonstrates practical, closed-form strategies for efficient, privacy-aware information extraction relevant to communications and learning systems.

Abstract

In this paper, we study a remote source coding scenario in which binary phase shift keying (BPSK) modulation sources are corrupted by additive white Gaussian noise (AWGN). An intermediate node, such as a relay, receives these observations and performs additional compression to balance complexity and relevance. This problem can be further formulated as an information bottleneck (IB) problem with Bernoulli sources and Gaussian mixture observations. However, no closed-form solution exists for this IB problem. To address this challenge, we propose a unified achievable scheme that employs three different compression/quantization strategies for intermediate node processing by using two-level quantization, multi-level deterministic quantization, and soft quantization with the hyperbolic tangent ($\tanh$) function, respectively. In addition, we extend our analysis to the vector mixture Gaussian observation problem and explore its application in machine learning for binary classification with information leakage. Numerical evaluations show that the proposed scheme has a near-optimal performance over various signal-to-noise ratios (SNRs), compared to the Blahut-Arimoto (BA) algorithm, and has better performance than some existing numerical methods such as the information dropout approach. Furthermore, experiments conducted on the realistic MNIST dataset also validate the superior classification accuracy of our method compared to the information dropout approach.

An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

TL;DR

This work addresses an information bottleneck problem arising in a remote source coding setup where a binary source

is observed through a Gaussian mixture

, and an intermediate node compresses the observation into

under a rate constraint

to maximize

under log-loss. It proposes three analytically tractable IB schemes—two-level random quantization, multi-level deterministic quantization, and soft quantization using

—as achievable solutions, with BA serving as a numerical optimum benchmark and information dropout as a competing approach. The results show that the proposed schemes achieve near-optimal performance across SNRs, outperform the information dropout baseline, and extend to vector mixture Gaussian observations, with applications to binary classification under information leakage and MNIST-based validation. The paper also connects IB to remote source coding and demonstrates practical, closed-form strategies for efficient, privacy-aware information extraction relevant to communications and learning systems.

Abstract

) function, respectively. In addition, we extend our analysis to the vector mixture Gaussian observation problem and explore its application in machine learning for binary classification with information leakage. Numerical evaluations show that the proposed scheme has a near-optimal performance over various signal-to-noise ratios (SNRs), compared to the Blahut-Arimoto (BA) algorithm, and has better performance than some existing numerical methods such as the information dropout approach. Furthermore, experiments conducted on the realistic MNIST dataset also validate the superior classification accuracy of our method compared to the information dropout approach.

Paper Structure (11 sections, 1 theorem, 16 equations, 2 figures)

This paper contains 11 sections, 1 theorem, 16 equations, 2 figures.

Introduction
Introduction of IB and its applications in communications
Applications of IB in machine learning
Main contributions
Notations and organization of the paper
System Model and Preliminary Results
Formulation of the IB Problem
Approximately numerically optimal scheme: Blahut-Arimoto (BA) algorithm
State-of-the-art scheme: information dropout method
Achievable bounds for Binary-Gaussian IB problem
An achievable IB solution via two-level random quantization

Key Result

Proposition 1

For the IB problem in eq:IB_HQ_LB with symmetric Bernoulli $Y$ and $X|Y \sim \mathcal{N}(y \beta, 1)$ as in eq:def_model_scalar, then for $0 \leq R \leq \ln 2$, the optimal rate $I^{\star}(Y;T)$ is lower bounded by $I_1(q)$, given by where $p = P_{\overline{X}|Y} (\overline{x}= 1|y =-1) = P_{\overline{X}|Y} (\overline{x}= 0|y =1) = \int_{0}^{\infty} \frac{1}{\sqrt{2 \pi}} \exp(-(x + \beta)^2/2

Figures (2)

Figure 1: The system diagram of the remote source coding theory.
Figure 2: Diagram of the information bottleneck problem.

Theorems & Definitions (3)

Proposition 1: An achievable IB solution via two-level quantization
proof : Proof of Proposition \ref{['prop:one-bit']}
Remark 1: IB solution with two-level quantization for $R\in [0, \ln2)$] When $R=0$ nats, according to the definition of $q$ in \ref{['eq:q']}, we have $q = 1/2$, leading to an optimal $I(Y; T)$ of $0$ based on \ref{['eq:I_1(q)']}. Similarly, for $R= \ln 2$ nats, the optimal value of $q$ that satisfies \ref{['eq:q']} can be either $0$ or $1$. From \ref{['eq:I_1(q)']}, we obtain $I(Y; T) = 1 - H(p)$ in this case. In our second approach, we set random noise $N=0$ in \ref{['eq:obj_func_1']} and employ an $L$-level deterministic quantizer $\widehat{Q}(\cdot)$ to map the observation $X$ into $L$ bins, with the intermediate representation $T$ given by T= f_{\text{non-linear}}(X) \overset{\Delta}{=} \widehat{Q}(X). Here, the quantization points are denoted as ${\{q_i\}}_{i=1}^{L-1}$, with $q_0 = -\infty$ and $q_L = \infty$, and $T$ is quantized as $t_j$ (the center of the quantization region) for $X \in [q_{j-1}, q_j]$, $\forall~ j \in {1, \cdots, L}$. Consequently, the conditional probability in \ref{['eq:conditional proba']} becomes \mathbb{P}(T = t_j = \frac{q_{j-1} + q_j}{2}|Y) = \mathbb{P}(q_{j-1} \leq X \leq q_j|Y)= Q(q_{j-1} - \beta Y) - Q(q_j - \beta Y), \forall j \in {1, \cdots, L}, with $Q(t) = \int_{t}^{\infty} \frac{1}{\sqrt{2\pi}}\exp(-x^2/2) dx$ is the Gaussian Q-function. Since the mapping from $X$ to $T$ is deterministic, the mutual information $I(X; T)$ becomes the entropy of $T$, i.e., $I(X; T) = H(T)$. We obtain a lower bound to the original IB in \ref{['eq:obj_func_1']} by solving the following problem \max_{\{q_i\}_{i=1}^{L-1}} \quadI(Y; T)\text{s.t.} \quadH(T) \leq R. To solve the problem \ref{['eq:IB_DQ_LB']} analytically, we can obtain a lower bound by setting the quantization level $L$ as $\lceil e^R \rceil$ and the probability of quantized $T$ space as \mathbb{P}(T = t_j) = \frac{1}{\lceil e^R \rceil} - \Delta,\text{if } j = 1,\frac{1}{\lceil e^R \rceil} + \frac{\Delta}{\lceil e^R \rceil-1},\text{if } j \neq 1, where the shift value $\Delta$ is determined to satisfy constraint \ref{['eq:constriat_detfunc']} as H(T) =-\left( \frac{1}{\lceil e^R \rceil} - \Delta\right) \log \left( \frac{1}{\lceil e^R \rceil} - \Delta\right)- \sum_{j=2}^{L} \left( \frac{1}{\lceil e^R \rceil} + \frac{\Delta}{\lceil e^R \rceil-1}\right) \log\left( \frac{1}{\lceil e^R \rceil} + \frac{\Delta}{\lceil e^R \rceil-1}\right)\overset{\Delta}{=} R. Therefore, according to \ref{['eq:det_p_t_y']}, quantization points $\{q_j\}_{j=1}^{L-1}$ can also be obtained by \mathbb{P}(q_{j-1} \leq X \leq q_j)= \mathbb{P}(Y=1) \mathbb{P}(q_{j-1} \leq X \leq q_j|Y=1)\quad + \mathbb{P}(Y=-1) \mathbb{P}(q_{j-1} \leq X \leq q_j|Y =-1)= 1/2\left( Q(q_{j-1} - \beta) - Q(q_j - \beta) \right)\quad +1/2 \left( Q(q_{j-1} + \beta) - Q(q_j + \beta)\right)\overset{\Delta}{=} \mathbb{P}(T=t_j), where $\mathbb{P}(T=t_j)$ is defined in \ref{['eq:prob_t']}. Note that if $R\leq \ln2$, the quantization level in this scheme is set as $L=2$, similar to the two-level quantization scheme. The deterministic quantization approach outlined above leads to the following proposition. For the IB problem in \ref{['eq:IB_DQ_LB']} with symmetric Bernoulli $Y$ and $X|Y \sim \mathcal{N}(y \beta, 1)$ as in \ref{['eq:def_model_scalar']}, then, the optimal rate $I^{\star}(Y; T)$ is lower bounded by $I_2(\Delta)$, the mutual information $I(Y; T)$ given $\Delta$, with $\Delta$ solution to \ref{['eq:Delta']}, and the quantization points $\{q_j\}_{j=1}^{\lceil e^R \rceil}$ can be obtained as ${\mathbb{P}}(q_{j-1} \leq X \leq q_j)= \frac{1}{\lceil e^R \rceil} - \Delta\text{if } j=1,\frac{1}{\lceil e^R \rceil} + \frac{\Delta}{\lceil e^R \rceil-1}\text{otherwise}.$

An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

TL;DR

Abstract

An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (3)