Table of Contents
Fetching ...

Learned Image Transmission with Hierarchical Variational Autoencoder

Guangyi Zhang, Hanlei Li, Yunlong Cai, Qiyu Hu, Guanding Yu, Runmin Zhang

TL;DR

This paper tackles robust, high-efficiency image transmission over wireless channels by introducing a hierarchical joint source-channel coding framework (HJSCC) built on a hierarchical variational autoencoder. The transmitter uses both bottom-up and top-down paths to autoregressively generate multiple latent representations which are encoded by several JSCC encoders, enabling dynamic, rate-adaptive transmission through masking and entropy-informed priors. A novel training objective couples rate terms derived from latent priors with distortion terms, and the approach is extended to a JSCC-with-feedback setting, modeling transmission as a probabilistic sampling process over noisy channels. Experimental results on Kodak and CLIC2022 show that HJSCC achieves superior rate-distortion performance and robustness to channel noise, with ablations confirming the effectiveness of spatial grouping and the rate attention module. The work offers a practical path toward scalable, adaptive, and feedback-enabled image transmission in future networks.

Abstract

In this paper, we introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission, utilizing a hierarchical variational autoencoder (VAE). Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. These representations are then directly mapped to channel symbols for transmission by the JSCC encoder. We extend this framework to scenarios with a feedback link, modeling transmission over a noisy channel as a probabilistic sampling process and deriving a novel generative formulation for JSCC with feedback. Compared with existing approaches, our proposed HJSCC provides enhanced adaptability by dynamically adjusting transmission bandwidth, encoding these representations into varying amounts of channel symbols. Extensive experiments on images of varying resolutions demonstrate that our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise. The source code will be made available upon acceptance.

Learned Image Transmission with Hierarchical Variational Autoencoder

TL;DR

This paper tackles robust, high-efficiency image transmission over wireless channels by introducing a hierarchical joint source-channel coding framework (HJSCC) built on a hierarchical variational autoencoder. The transmitter uses both bottom-up and top-down paths to autoregressively generate multiple latent representations which are encoded by several JSCC encoders, enabling dynamic, rate-adaptive transmission through masking and entropy-informed priors. A novel training objective couples rate terms derived from latent priors with distortion terms, and the approach is extended to a JSCC-with-feedback setting, modeling transmission as a probabilistic sampling process over noisy channels. Experimental results on Kodak and CLIC2022 show that HJSCC achieves superior rate-distortion performance and robustness to channel noise, with ablations confirming the effectiveness of spatial grouping and the rate attention module. The work offers a practical path toward scalable, adaptive, and feedback-enabled image transmission in future networks.

Abstract

In this paper, we introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission, utilizing a hierarchical variational autoencoder (VAE). Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. These representations are then directly mapped to channel symbols for transmission by the JSCC encoder. We extend this framework to scenarios with a feedback link, modeling transmission over a noisy channel as a probabilistic sampling process and deriving a novel generative formulation for JSCC with feedback. Compared with existing approaches, our proposed HJSCC provides enhanced adaptability by dynamically adjusting transmission bandwidth, encoding these representations into varying amounts of channel symbols. Extensive experiments on images of varying resolutions demonstrate that our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise. The source code will be made available upon acceptance.
Paper Structure (21 sections, 10 equations, 8 figures, 1 table)

This paper contains 21 sections, 10 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Probabilistic model of VAEs and hierarchical ResNet VAE. The bias is a trainable parameter.
  • Figure 2: Diagram of a deep learning-based JSCC system.
  • Figure 3: The probabilistic diagram of the proposed HJSCC. The transmitter employs the bottom-up and top-down paths for encoding, while the receiver reconstructs the image with the received symbols.
  • Figure 4: The probabilistic diagram of the proposed HJSCC with feedback.
  • Figure 5: The illustration of the process of the JSCC encoder and JSCC decoder for transmitting latent representation $\boldsymbol{\mu}_l$.
  • ...and 3 more figures