Symbol Distributions in Semantic Communications: A Source-Channel Equilibrium Perspective
Hanju Yoo, Dongha Choi, Songkuk Kim, Chan-Byoung Chae, Robert W. Heath
TL;DR
The paper explains why semantic-encoder symbols exhibit heavy-tailed distributions by framing a trade-off between source coding efficiency and channel-throughput maximization. It derives a theoretical result: under a joint objective and power-constrained symbols, the pre-noise symbol distribution follows a scaled Student's t-distribution, bridging Gaussian and Cauchy extremes. Through extensive experiments with DeepJSCC and NTSCC on ImageNet and CIFAR-10, it shows how the tail parameter nu shifts with coding scheme, dataset entropy variability, and channel SNR, validating the theory. Finally, it introduces a distribution-regulating KL loss to steer the encoder toward a target prior, significantly improving training convergence in certain regimes and offering a principled design tool for semantic communication systems.
Abstract
Semantic communication systems often use an end-to-end neural network to map input data into continuous symbols. These symbols, which are essentially neural network features, usually have fixed dimensions and heavy-tailed distributions. However, due to the end-to-end training nature of the neural network encoder, the underlying reason for the symbol distribution remains underexplored. We propose a new explanation for the semantic symbol distribution: an inherent trade-off between source coding and communications. Specifically, the encoder balances two objectives: allocating power for minimum \emph{effective codelength} (for source coding) and maximizing mutual information (for communications). We formalize this trade-off via an information-theoretic optimization framework, which yields a Student's $t$-distribution as the resulting symbol distribution. Through extensive studies on image-based semantic systems, we find that our formulation models the learned symbols and predicts how the symbol distribution's shape parameter changes with respect to (i) the use of variable-length coding and (ii) the dataset's entropy variability. Furthermore, we demonstrate how introducing a regularizer that enforces a target symbol distribution, which guides the encoder towards a target prior (e.g., Gaussian), improves training convergence and supports our hypothesis.
