Table of Contents
Fetching ...

Knowledge Distillation Driven Semantic NOMA for Image Transmission with Diffusion Model

Qifei Wang, Zhen Gao, Shuo Sun, Zhijin Qin, Xiaodong Xu, Meixia Tao

TL;DR

The paper tackles uplink multi-user image transmission under semantic NOMA by proposing KDD-SemNOMA, a framework that couples a ConvNeXt-based DeepJSCC with an Enhanced AF-Module for robust channel adaptation. It introduces a teacher-student knowledge distillation scheme to transfer interference-free feature knowledge from an orthogonal-channel teacher to a SemNOMA student, improving pixel fidelity without increasing inference cost. A diffusion-model-based two-stage refinement further enhances perceptual quality by leveraging generative priors and error contraction in the forward diffusion process. Empirical results on CIFAR-10 and FFHQ-256 show improved PSNR, SSIM, and perceptual metrics (LPIPS, FID) over state-of-the-art baselines under AWGN and Rayleigh channels, demonstrating strong potential for efficient, high-quality multi-user semantic image transmission in 6G scenarios.

Abstract

As a promising 6G enabler beyond conventional bit-level transmission, semantic communication can considerably reduce required bandwidth resources, while its combination with multiple access requires further exploration. This paper proposes a knowledge distillation-driven and diffusion-enhanced (KDD) semantic non-orthogonal multiple access (NOMA), named KDD-SemNOMA, for multi-user uplink wireless image transmission. Specifically, to ensure robust feature transmission across diverse transmission conditions, we firstly develop a ConvNeXt-based deep joint source and channel coding architecture with enhanced adaptive feature module. This module incorporates signal-to-noise ratio and channel state information to dynamically adapt to additive white Gaussian noise and Rayleigh fading channels. Furthermore, to improve image restoration quality without inference overhead, we introduce a two-stage knowledge distillation strategy, i.e., a teacher model, trained on interference-free orthogonal transmission, guides a student model via feature affinity distillation and cross-head prediction distillation. Moreover, a diffusion model-based refinement stage leverages generative priors to transform initial SemNOMA outputs into high-fidelity images with enhanced perceptual quality. Extensive experiments on CIFAR-10 and FFHQ-256 datasets demonstrate superior performance over state-of-the-art methods, delivering satisfactory reconstruction performance even at extremely poor channel conditions. These results highlight the advantages in both pixel-level accuracy and perceptual metrics, effectively mitigating interference and enabling high-quality image recovery.

Knowledge Distillation Driven Semantic NOMA for Image Transmission with Diffusion Model

TL;DR

The paper tackles uplink multi-user image transmission under semantic NOMA by proposing KDD-SemNOMA, a framework that couples a ConvNeXt-based DeepJSCC with an Enhanced AF-Module for robust channel adaptation. It introduces a teacher-student knowledge distillation scheme to transfer interference-free feature knowledge from an orthogonal-channel teacher to a SemNOMA student, improving pixel fidelity without increasing inference cost. A diffusion-model-based two-stage refinement further enhances perceptual quality by leveraging generative priors and error contraction in the forward diffusion process. Empirical results on CIFAR-10 and FFHQ-256 show improved PSNR, SSIM, and perceptual metrics (LPIPS, FID) over state-of-the-art baselines under AWGN and Rayleigh channels, demonstrating strong potential for efficient, high-quality multi-user semantic image transmission in 6G scenarios.

Abstract

As a promising 6G enabler beyond conventional bit-level transmission, semantic communication can considerably reduce required bandwidth resources, while its combination with multiple access requires further exploration. This paper proposes a knowledge distillation-driven and diffusion-enhanced (KDD) semantic non-orthogonal multiple access (NOMA), named KDD-SemNOMA, for multi-user uplink wireless image transmission. Specifically, to ensure robust feature transmission across diverse transmission conditions, we firstly develop a ConvNeXt-based deep joint source and channel coding architecture with enhanced adaptive feature module. This module incorporates signal-to-noise ratio and channel state information to dynamically adapt to additive white Gaussian noise and Rayleigh fading channels. Furthermore, to improve image restoration quality without inference overhead, we introduce a two-stage knowledge distillation strategy, i.e., a teacher model, trained on interference-free orthogonal transmission, guides a student model via feature affinity distillation and cross-head prediction distillation. Moreover, a diffusion model-based refinement stage leverages generative priors to transform initial SemNOMA outputs into high-fidelity images with enhanced perceptual quality. Extensive experiments on CIFAR-10 and FFHQ-256 datasets demonstrate superior performance over state-of-the-art methods, delivering satisfactory reconstruction performance even at extremely poor channel conditions. These results highlight the advantages in both pixel-level accuracy and perceptual metrics, effectively mitigating interference and enabling high-quality image recovery.

Paper Structure

This paper contains 27 sections, 20 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: Overview of the proposed KDD-SemNOMA Framework.
  • Figure 2: Detail architecture of the enhanced AF-Module.
  • Figure 3: Architecture of the ConvNeXt-based DeepJSCC Network, where Conv2D $\downarrow 2$ represents 2-dimensional convolution with stride=2 for downsampling.
  • Figure 4: The overall framework of KD-SemNOMA. The teacher model employs orthogonal channel for image semantic features transmission, while the student model utilizes non-orthogonal channel. $l_j$ denotes the $j$-th layer of the decoder. $\hat{x}_i$ denotes the student model output of the $i$-th UE, $\mathbf{f}_{i,0}^t$ denotes the input feature of the teacher decoder, $\mathcal{L}_{\text{MAEi}}$ denotes the MAE loss of the $i$-th UE, $\mathcal{L}_{\text{CrossKDi}}$ denotes the CrossKD loss of the $i$-th UE, $\mathcal{L}_{\text{FAi}}$ denotes the FA loss of the $i$-th UE.
  • Figure 5: Block diagram of the image refinement based on the pre-trained diffusion model (KDD-SemNOMA).
  • ...and 7 more figures