Neural Distributed Compressor Discovers Binning
Ezgi Ozyilkan, Johannes Ballé, Elza Erkip
TL;DR
The paper tackles the one-shot Wyner–Ziv problem by proposing a data-driven, unstructured entropy-constrained vector quantization (ECVQ) framework that leverages decoder side information without requiring explicit source distributions. It introduces two neural formulations — a marginal variant with a classic entropy coder and a conditional variant paired with an ideal Slepian–Wolf coder — each backed by variational upper bounds on the entropy of the latent representation and its conditional entropy given the side information. Demonstrations on Gaussian and Laplacian sources reveal emergent binning in the source space and near–Wyner–Ziv performance, including optimal decoding within quantization regions and symmetry-driven binning in the Laplacian case. The work provides evidence that data-driven learning can recover fundamental WZ mechanisms like binning and joint index–side-information decoding, suggesting a practical path toward low-latency distributed compression without strong distributional priors.
Abstract
We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.
