Table of Contents
Fetching ...

VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

Yongyi Miao, Zhongdang Li, Yang Wang, Die Hu, Jun Yan, Youfang Wang

TL;DR

VQ-DeepVSC introduces a dual-stage, DL-based vector quantization framework for video semantic communication over wireless channels. The first stage, AKEI, uses IFNet and FusionNet-based key-frame extraction and interpolation to minimize temporal redundancy and mitigate cliff-effects under poor SNRs, while the second stage, MSVQ, compresses key frames with a shared latent embedding space (MOC-RVQ) and SCN-based decoding to reduce intra-frame redundancy and support high-resolution video. An adjustable index selector/restorer further reduces redundancy by encoding only frame-index changes, enabling flexible compression ratios. Experimental results on UCF101 show superior MS-SSIM and LPIPS performance compared to H.265 at similar BCRs, with robust performance under AWGN and multipath fading, indicating practical viability and compatibility with existing digital systems.

Abstract

In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which intelligently select key frames to minimize inter-frame redundancy and mitigate the cliff-effect under challenging channel conditions. In the second stage, we propose the semantic vector quantization encoder and decoder, placed respectively at the transmitter and receiver, which efficiently compress key frames using advanced indexing and spatial normalization modules to reduce redundancy. Additionally, we propose adjustable index selection and recovery modules, enhancing compression efficiency and enabling flexible compression ratio adjustment. Compared to the joint source-channel coding (JSCC) framework, the proposed framework exhibits superior compatibility with current digital communication systems. Experimental results demonstrate that VQ-DeepVSC achieves substantial improvements in both Multi-Scale Structural Similarity (MS-SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) metrics than the H.265 standard, particularly under low channel signal-to-noise ratio (SNR) or multi-path channels, highlighting the significantly enhanced transmission capabilities of our approach.

VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

TL;DR

VQ-DeepVSC introduces a dual-stage, DL-based vector quantization framework for video semantic communication over wireless channels. The first stage, AKEI, uses IFNet and FusionNet-based key-frame extraction and interpolation to minimize temporal redundancy and mitigate cliff-effects under poor SNRs, while the second stage, MSVQ, compresses key frames with a shared latent embedding space (MOC-RVQ) and SCN-based decoding to reduce intra-frame redundancy and support high-resolution video. An adjustable index selector/restorer further reduces redundancy by encoding only frame-index changes, enabling flexible compression ratios. Experimental results on UCF101 show superior MS-SSIM and LPIPS performance compared to H.265 at similar BCRs, with robust performance under AWGN and multipath fading, indicating practical viability and compatibility with existing digital systems.

Abstract

In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which intelligently select key frames to minimize inter-frame redundancy and mitigate the cliff-effect under challenging channel conditions. In the second stage, we propose the semantic vector quantization encoder and decoder, placed respectively at the transmitter and receiver, which efficiently compress key frames using advanced indexing and spatial normalization modules to reduce redundancy. Additionally, we propose adjustable index selection and recovery modules, enhancing compression efficiency and enabling flexible compression ratio adjustment. Compared to the joint source-channel coding (JSCC) framework, the proposed framework exhibits superior compatibility with current digital communication systems. Experimental results demonstrate that VQ-DeepVSC achieves substantial improvements in both Multi-Scale Structural Similarity (MS-SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) metrics than the H.265 standard, particularly under low channel signal-to-noise ratio (SNR) or multi-path channels, highlighting the significantly enhanced transmission capabilities of our approach.
Paper Structure (27 sections, 20 equations, 7 figures)

This paper contains 27 sections, 20 equations, 7 figures.

Figures (7)

  • Figure 1: The overall system architecture of the proposed VQ-DeepVSC.
  • Figure 2: The structure of the adaptive key-frame extractor.
  • Figure 3: The structure of the adaptive key-frame interpolator.
  • Figure 4: The structure of semantic quantization vector encoder.
  • Figure 5: The structure of semantic vector quantization decoder.
  • ...and 2 more figures