Table of Contents
Fetching ...

Neural Distributed Source Coding

Jay Whang, Alliot Nagle, Anish Acharya, Hyeji Kim, Alexandros G. Dimakis

TL;DR

This work introduces Neural DSC, a framework that learns distributed lossy compression for high-dimensional, arbitrarily correlated sources by coupling a conditional VQ-VAE with a latent prior for entropy coding. It connects distributed source coding to a modified ELBO objective (dELBO) and demonstrates that a decoder-side side information setup can be effectively modeled with a conditional VQ-VAE, including a latent prior that yields rate improvements. Empirically, Neural DSC achieves state-of-the-art PSNR on KITTI stereo images at rates above 0.1 bpp, handles complex correlations beyond simple spatial overlap, and even extends to gradient compression for distributed training, all with significantly fewer parameters than prior baselines. The results highlight the practicality of data-driven, learned DSC and point to broader applications in multi-view and cross-modal compression, as well as potential integration with traditional DISCUS-like schemes.

Abstract

Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational Autoencoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR. Our code is made available at https://github.com/acnagle/neural-dsc.

Neural Distributed Source Coding

TL;DR

This work introduces Neural DSC, a framework that learns distributed lossy compression for high-dimensional, arbitrarily correlated sources by coupling a conditional VQ-VAE with a latent prior for entropy coding. It connects distributed source coding to a modified ELBO objective (dELBO) and demonstrates that a decoder-side side information setup can be effectively modeled with a conditional VQ-VAE, including a latent prior that yields rate improvements. Empirically, Neural DSC achieves state-of-the-art PSNR on KITTI stereo images at rates above 0.1 bpp, handles complex correlations beyond simple spatial overlap, and even extends to gradient compression for distributed training, all with significantly fewer parameters than prior baselines. The results highlight the practicality of data-driven, learned DSC and point to broader applications in multi-view and cross-modal compression, as well as potential integration with traditional DISCUS-like schemes.

Abstract

Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational Autoencoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR. Our code is made available at https://github.com/acnagle/neural-dsc.

Paper Structure

This paper contains 45 sections, 1 theorem, 9 equations, 13 figures, 5 tables.

Key Result

Proposition 1

Let $\bm{x}, \bm{y}, {\bm{z}}$ be random variables following the generative process $\bm{y} \rightarrow \bm{x} \leftarrow {\bm{z}}$, i.e., $\bm{z}$ is the latent variable that is independent of $\bm{y}$ and $p(\bm{x},\bm{y},\bm{z}) = p(\bm{x}|\bm{y}, \bm{z})p(\bm{y})p(\bm{z})$. Then for any choice o

Figures (13)

  • Figure 1: A distributed encoder that has no access to the correlated side-information (left) can asymptotically achieve the same compression rate as when side-information is available at both the encoder and the decoder (right).
  • Figure 2: We use a modified VQ-VAE architecture with correlated side information at the decoder. We train a prior model over the quantized latents to achieve better compression rates via arithmetic coding. The SI Net + Cond Net is a convolutional neural network with residual connections and is similar to the encoder in its architecture. Figures with more details on the exact architecture of the encoder and decoder, and the SI Net + Cond Net are shown in \ref{['fig:vqvae_arch_conv']} and \ref{['fig:si_net_arch_conv']}, respectively.
  • Figure 3: We compare our method with other DSC methods on the KITTI Stereo image dataset. (a) Comparison of the PSNR rate-distortion curves. With the latent prior model, we match the previous state-of-the-art method (NDIC-CAM) for low bpp and outperform all methods at higher bpp. Our method achieves higher PSNR than other methods for low rates and remains competitive for higher rates. With the latent prior model, we outperform all methods (b) Comparison of the MS-SSIM rate-distortion curves. Our method with the latent prior model is able to match the performance of NDIC-CAM.Our method achieves higher PSNR than other methods for low rates and remains competitive for higher rates. With the latent prior model, we outperform all methods.
  • Figure 4: Comparison of different coding schemes on CelebA-HQ. This shows that the distributed encoder is able to nearly match the performance of joint encoder, while achieving a noticeable improvement in rate-distortion over the "separate" encoder that does not utilize side information.
  • Figure 5: Reconstructions from our distributed VQ-VAE with different types of distributed encoder under different side information. We can see that providing wrong or random side information to the distributed VQ-VAE decoder affects the output in a semantic way (e.g. the skin tone changes, while the background remains identical).
  • ...and 8 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Remark 1