Table of Contents
Fetching ...

The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

Junli Fang, João F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong

TL;DR

This work introduces the rate-distortion-perception-classification (RDPC) tradeoff in joint source coding and modulation (JSCM), showing that minimizing channel rate under distortion, perceptual, and classification constraints yields a strict, convex tradeoff. It provides two complementary solutions: RDPCO, a heuristic optimizer under Gaussian mixture and linear encoders/decoders, and ID-GAN, an inverse-domain GAN framework that learns end-to-end encoders/decoders to balance reconstruction, perceptual quality, and semantic classification under channel noise. Theoretical results include a tight bound on RDPC under GMM assumptions and a convexity property of R(D,P,C), with empirical validation demonstrating RDPC behavior and superior perceptual and semantic performance relative to separation-based and existing deep JSCM methods. Collectively, the methods enable extreme compression while preserving perceptual integrity and classification accuracy, offering practical insights for robust, task-aware communications and semantic transmission.

Abstract

The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy, a tradeoff that we name RDPC. We then propose two image compression methods to navigate that tradeoff: the RDPCO algorithm which, under simple assumptions, directly solves the optimization problem characterizing the tradeoff, and an algorithm based on an inverse-domain generative adversarial network (ID-GAN), which is more general and achieves extreme compression. Simulation results corroborate the theoretical findings, showing that both algorithms exhibit the RDPC tradeoff. They also demonstrate that the proposed ID-GAN algorithm effectively balances image distortion, perception, and classification accuracy, and significantly outperforms traditional separation-based methods and recent deep JSCM architectures in terms of one or more of these metrics.

The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

TL;DR

This work introduces the rate-distortion-perception-classification (RDPC) tradeoff in joint source coding and modulation (JSCM), showing that minimizing channel rate under distortion, perceptual, and classification constraints yields a strict, convex tradeoff. It provides two complementary solutions: RDPCO, a heuristic optimizer under Gaussian mixture and linear encoders/decoders, and ID-GAN, an inverse-domain GAN framework that learns end-to-end encoders/decoders to balance reconstruction, perceptual quality, and semantic classification under channel noise. Theoretical results include a tight bound on RDPC under GMM assumptions and a convexity property of R(D,P,C), with empirical validation demonstrating RDPC behavior and superior perceptual and semantic performance relative to separation-based and existing deep JSCM methods. Collectively, the methods enable extreme compression while preserving perceptual integrity and classification accuracy, offering practical insights for robust, task-aware communications and semantic transmission.

Abstract

The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy, a tradeoff that we name RDPC. We then propose two image compression methods to navigate that tradeoff: the RDPCO algorithm which, under simple assumptions, directly solves the optimization problem characterizing the tradeoff, and an algorithm based on an inverse-domain generative adversarial network (ID-GAN), which is more general and achieves extreme compression. Simulation results corroborate the theoretical findings, showing that both algorithms exhibit the RDPC tradeoff. They also demonstrate that the proposed ID-GAN algorithm effectively balances image distortion, perception, and classification accuracy, and significantly outperforms traditional separation-based methods and recent deep JSCM architectures in terms of one or more of these metrics.
Paper Structure (22 sections, 4 theorems, 45 equations, 10 figures, 2 algorithms)

This paper contains 22 sections, 4 theorems, 45 equations, 10 figures, 2 algorithms.

Key Result

Theorem 1

Let $\bm{X}$ be a multiclass model as in eq:SourceModel. Consider the communication scheme in eq:channeldiagram and the associated RDPC problem in eq:problem-general. Assume the classifier $c_0$ is deterministic and that the perception function $d(\cdot, \cdot)$ is convex in its second argument. The

Figures (10)

  • Figure 1: Proposed ID-GAN framework for solving the RDPC problem \ref{['eq:problem-general']}. The decoder is first trained adversarially with critic 1 in the first step (top). The decoder is then fixed and coupled with an encoder, which is in turn trained with critic 2 in order to preserve both reconstruction quality and classification accuracy (bottom). Critics 1 and 2 have the same architecture.
  • Figure 2: Values of (a) distortion, (b) perception, and (c) classification error for RDPCO for varying distortion parameter $D$. These metrics are computed by the right-hand side of the expressions in \ref{['eq:finalexpressiondistortion']}, \ref{['eq:boundWasserstein']}, and \ref{['eq:bhattacharyya-our']}, respectively.
  • Figure 3: Values of (a) distortion, (b) perception, and (c) classification error for RDPCO for varying latent dimension $m$ and hyperparameters $(P, C) = (4.1, 0.1)$.
  • Figure 4: Rate-distortion curves of RDPCO for (a) varying $P$ and $C$, and (b) varying compressed dimension $m$ with hyperparameters $(P, C) = (4.1, 0.1)$.
  • Figure 5: Architectures of the decoder $d(\cdot\, ;\, \bm{\theta}_d)$ and of the critics $f_1$ and $f_2$ in ID-GAN [cf. Fig. \ref{['fig:ID-GAN']}]. FC stands for fully connected layer, conv for convolutional layer, and conv_transp for transposed convolutional layer. We indicate the dimensions of the layer as well as the size of the kernels, stride, and padding.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Lemma 3
  • proof
  • Theorem
  • proof
  • Lemma
  • proof