Table of Contents
Fetching ...

MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication

Yingbin Zhou, Yaping Sun, Guanying Chen, Xiaodong Xu, Hao Chen, Binhong Huang, Shuguang Cui, Ping Zhang

TL;DR

The experimental results consistently demonstrate the superior performance of MOC-RVQ over conventional methods such as BPG or JPEG, and achieves comparable performance to an analog JSCC scheme, while needing only one-sixth of the channel bandwidth ratio (CBR) and being directly compatible with digital transmission systems.

Abstract

Vector quantization-based image semantic communication systems have successfully boosted transmission efficiency, but face challenges with conflicting requirements between codebook design and digital constellation modulation. Traditional codebooks need wide index ranges, while modulation favors few discrete states. To address this, we propose a multilevel generative semantic communication system with a two-stage training framework. In the first stage, we train a high-quality codebook, using a multi-head octonary codebook (MOC) to compress the index range. In addition, a residual vector quantization (RVQ) mechanism is also integrated for effective multilevel communication. In the second stage, a noise reduction block (NRB) based on Swin Transformer is introduced, coupled with the multilevel codebook from the first stage, serving as a high-quality semantic knowledge base (SKB) for generative feature restoration. Finally, to simulate modern image transmission scenarios, we employ a diverse collection of high-resolution 2K images as the test set. The experimental results consistently demonstrate the superior performance of MOC-RVQ over conventional methods such as BPG or JPEG. Additionally, MOC-RVQ achieves comparable performance to an analog JSCC scheme, while needing only one-sixth of the channel bandwidth ratio (CBR) and being directly compatible with digital transmission systems.

MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication

TL;DR

The experimental results consistently demonstrate the superior performance of MOC-RVQ over conventional methods such as BPG or JPEG, and achieves comparable performance to an analog JSCC scheme, while needing only one-sixth of the channel bandwidth ratio (CBR) and being directly compatible with digital transmission systems.

Abstract

Vector quantization-based image semantic communication systems have successfully boosted transmission efficiency, but face challenges with conflicting requirements between codebook design and digital constellation modulation. Traditional codebooks need wide index ranges, while modulation favors few discrete states. To address this, we propose a multilevel generative semantic communication system with a two-stage training framework. In the first stage, we train a high-quality codebook, using a multi-head octonary codebook (MOC) to compress the index range. In addition, a residual vector quantization (RVQ) mechanism is also integrated for effective multilevel communication. In the second stage, a noise reduction block (NRB) based on Swin Transformer is introduced, coupled with the multilevel codebook from the first stage, serving as a high-quality semantic knowledge base (SKB) for generative feature restoration. Finally, to simulate modern image transmission scenarios, we employ a diverse collection of high-resolution 2K images as the test set. The experimental results consistently demonstrate the superior performance of MOC-RVQ over conventional methods such as BPG or JPEG. Additionally, MOC-RVQ achieves comparable performance to an analog JSCC scheme, while needing only one-sixth of the channel bandwidth ratio (CBR) and being directly compatible with digital transmission systems.
Paper Structure (21 sections, 9 equations, 5 figures, 1 algorithm)

This paper contains 21 sections, 9 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: An essential structure for VQ-based semantic communication with the presence of a stochastic physical channel.
  • Figure 2: Left: The proposed two-stage framework. In Stage 1, we initially pretrain MOC-RVQ and other model components to generate compact representation of high-resolution image. Then the noise reduction block (NRB) is finetuned in Stage 2 to achieve feature restoration. Right: The architecture of the proposed MOC and MOC-RVQ. In MOC, each residual feature $\mathbf{r}_{d-1}$ is first divided into several heads, then the 8-state codebooks are employed to quantize each head feature, and finally these features are concatenated together to form a single quantized feature. In MOC-RVQ, feature are recursive quantized by MOCs to produce residual features and the corresponding code indices. Note that the output of MOC-RVQ is a summation over all quantized features $\mathbf{e}^{(d)}_{\mathbf{s}_d}$.
  • Figure 3: Experimental comparison using PSNR, SSIM, and LPIPS metrics over AWGN channels with SNR from -5 to 30. L1 to L4 correspond to different quantization levels of MOC-RVQ, with L1 as a baseline for the VQGAN-based variant. JSCC-96 is an analog baseline, with a CBR six times higher than L4, and requires analog modulation or full-resolution constellation, which hinders compatibility with current digital communication systems.
  • Figure 4: Ablation study to comprehend the influence of our proposed noise reduction block (NRB) and codebook reordering (CR) algorithm. Note that all experiments are conducted under L4 transmission, and 'w/o' denotes 'without'. There are clear gaps between different setting, which verifies the effectiveness of the proposed NRB and CR designs.
  • Figure 5: Visualization of removing both the NRB and CR across varying channel conditions. Even in poor channel quality (SNR=0), our full model impressively reconstruct images with clear meaning, demonstrating the robustness of VQ-based semantic communication systems with the assistance of NRB and CR.