Table of Contents
Fetching ...

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Jincheng Dai, Xiaoqi Qin, Sixian Wang, Lexi Xu, Kai Niu, Ping Zhang

TL;DR

This work frames deep probabilistic generative modeling as a bridge between Shannon information theory's data compression and distortion correction for reliable transmission, proposing end-to-end generative approaches to improve both efficiency and resiliency. It analyzes neural compression across continuous (infinite) and discrete (finite) quantization regimes, introduces nonlinear transform coding (NTC), and delineates four paradigms—lossless, lossy, perceptual, and semantic—within rate–distortion trade-offs ($R$-$D$) and rate–distortion–perception trade-offs ($R$-$D$-$P$). It advances joint source–channel coding by presenting strongly-coupled JSCC with end-to-end learned mappings and weakly-coupled JSCC that relies on latent-packet loss concealment via masked-transformer priors, including nonlinear transform source–channel coding (NTSCC) and latent conditioning for posterior sampling. The findings demonstrate that generative priors enable graceful degradation and improved resilience under adverse channels, while token-based semantic compression and MT-based latent modeling push toward robust, semantically meaningful end-to-end communication. Overall, the paper gives a unified perspective linking foundation generative models with both source and channel coding to enable efficient, resilient, and potentially semantically aware communications, and it frames future work at the intersection of compression, transmission, and higher-level cognitive capabilities.

Abstract

Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic.

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

TL;DR

This work frames deep probabilistic generative modeling as a bridge between Shannon information theory's data compression and distortion correction for reliable transmission, proposing end-to-end generative approaches to improve both efficiency and resiliency. It analyzes neural compression across continuous (infinite) and discrete (finite) quantization regimes, introduces nonlinear transform coding (NTC), and delineates four paradigms—lossless, lossy, perceptual, and semantic—within rate–distortion trade-offs (-) and rate–distortion–perception trade-offs (--). It advances joint source–channel coding by presenting strongly-coupled JSCC with end-to-end learned mappings and weakly-coupled JSCC that relies on latent-packet loss concealment via masked-transformer priors, including nonlinear transform source–channel coding (NTSCC) and latent conditioning for posterior sampling. The findings demonstrate that generative priors enable graceful degradation and improved resilience under adverse channels, while token-based semantic compression and MT-based latent modeling push toward robust, semantically meaningful end-to-end communication. Overall, the paper gives a unified perspective linking foundation generative models with both source and channel coding to enable efficient, resilient, and potentially semantically aware communications, and it frames future work at the intersection of compression, transmission, and higher-level cognitive capabilities.

Abstract

Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic.
Paper Structure (8 sections, 6 figures)

This paper contains 8 sections, 6 figures.

Figures (6)

  • Figure 1: Illustration of different compression paradigms enabled by deep generative models. In this figure, the solid color blocks stand for latent variables or source data that have been encoded as bit sequences to be transmitted or stored, and the dashed color blocks represent the ingredients that are not encoded but to be predicted using the probabilistic information provided by generative models. Here, $R$ denotes bit-rate cost, $D$ denotes distortion, and $P$ denotes the perceptual quality (realism) metric. The upward arrow "$\uparrow$" indicates "higher value is better", and vice versa. The change trends of distortion and realism with bit-rate are illustrated.
  • Figure 2: Overview of data compression systems enabled by deep generative modeling. This figure presents two distinct roadmaps tailored for high and low bit-rate regions, respectively. It illustrates the integration of continuous and discrete generative models strategically employed to regulate the bit-rate of latent variables in data compression systems, $\bm{z}$ and $\bm{t}$ denote quantized latent codes, $\bm{s}$ denotes semantic guidance.
  • Figure 3: Illustrating four types of image compression in terms of the rate-distortion curve. Lossless compression, targeting pixel-perfect reconstructions, typically results in higher bit rates (around 9.07 bits per pixel (bpp)) compared to lossy compression. Neural lossy compression algorithms generally operate above 0.1 bpp, with distortions becoming nearly imperceptible above 1 bpp, as visually evidenced. However, at rates below 1 bpp, lossy compression can introduce blurry artifacts, impacting visual perception. Perceptual compression, operating between 0.02 and 1 bpp, is designed to generate realistic textures, maintaining perceptual quality at reduced rates, as seen in third image from the right. When rates fall below 0.1 bpp, the syntactic fidelity drops, and textures become distorted. At extreme low bit-rates (below 0.02 bpp), where traditional methods falter, semantic compression steps in, focusing on preserving essential visual information. Utilizing advanced generative models, semantic compression can generate meaningful reconstructions from minimal data, such as a short text prompt (0.0023 bpp) or a sketch prompt (0.0097 bpp), preserving key semantic elements and structural information.
  • Figure 4: Overview of data transmission systems enabled by deep generative models. This figure presents two distinct roadmaps tailored for source compression and channel transmission strongly and weakly coupled, respectively.
  • Figure 5: Performance comparison of three state-of-the-art wireless image transmission methods, where the example photo is sampled from the widely used FFHQ dataset (specifically image ID: 69037). The upward arrow "$\uparrow$" indicates "higher value is better", and vice versa. For each method, we visualize their reconstructions with decreasing channel signal-to-noise ratio (SNR) from 6dB to $-$4dB. The visual quality of these methods is evaluated in terms of consistency, realism, and distortion, measured by LPIPS, FID, and MSE metrics, respectively. In particular, for VTM + 5G LDPC + QAM with the AMC mechanism, $\text{SNR}_{\text{AMC}}$ in the first row denotes the estimated SNR used to select the coding rate of LDPC code and the QAM level, corresponding to the actual channel SNR displayed in the second row.
  • ...and 1 more figures