Variable-Length Joint Source-Channel Coding for Semantic Communication
Yujie Zhou, Rulong Wang, Yong Xiao, Yingyu Li, Guangming Shi
TL;DR
This work tackles the mismatch between continuous deep-JSCC encodings and discrete digital systems by introducing E2EC, an end-to-end framework that extends the information bottleneck to noisy channels and enables variable-length, discrete coding. By decomposing encoding into a length predictor and a content encoder, E2EC achieves bit-level rate control while maintaining semantic fidelity, using policy-gradient methods to train through non-differentiable steps. The transformed objective provides a computable bound via a variational cross-entropy, and the decoder employs a one-to-one embedding that maps binary codewords to semantically meaningful representations. Experiments on MNIST with a BSC show that E2EC outperforms fixed-length baselines and adapts its rate and content according to channel conditions, highlighting a practical path toward efficient, digital SemCom systems.
Abstract
This paper investigates a key challenge faced by joint source-channel coding (JSCC) in digital semantic communication (SemCom): the incompatibility between existing JSCC schemes that yield continuous encoded representations and digital systems that employ discrete variable-length codewords. It further results in feasibility issues in achieving physical bit-level rate control via such JSCC approaches for efficient semantic transmission. In this paper, we propose a novel end-to-end coding (E2EC) framework to tackle it. The semantic coding problem is formed by extending the information bottleneck (IB) theory over noisy channels, which is a tradeoff between bit-level communication rate and semantic distortion. With a structural decomposition of encoding to handle code length and content respectively, we can construct an end-to-end trainable encoder that supports the direct compression of a data source into a finite codebook. To optimize our E2EC across non-differentiable operations, e.g., sampling, we use the powerful policy gradient to support gradient-based updates. Experimental results illustrate that E2EC achieves high inference quality with low bit rates, outperforming representative baselines compatible with digital SemCom systems.
