Knowledge Distillation Driven Semantic NOMA for Image Transmission with Diffusion Model
Qifei Wang, Zhen Gao, Shuo Sun, Zhijin Qin, Xiaodong Xu, Meixia Tao
TL;DR
The paper tackles uplink multi-user image transmission under semantic NOMA by proposing KDD-SemNOMA, a framework that couples a ConvNeXt-based DeepJSCC with an Enhanced AF-Module for robust channel adaptation. It introduces a teacher-student knowledge distillation scheme to transfer interference-free feature knowledge from an orthogonal-channel teacher to a SemNOMA student, improving pixel fidelity without increasing inference cost. A diffusion-model-based two-stage refinement further enhances perceptual quality by leveraging generative priors and error contraction in the forward diffusion process. Empirical results on CIFAR-10 and FFHQ-256 show improved PSNR, SSIM, and perceptual metrics (LPIPS, FID) over state-of-the-art baselines under AWGN and Rayleigh channels, demonstrating strong potential for efficient, high-quality multi-user semantic image transmission in 6G scenarios.
Abstract
As a promising 6G enabler beyond conventional bit-level transmission, semantic communication can considerably reduce required bandwidth resources, while its combination with multiple access requires further exploration. This paper proposes a knowledge distillation-driven and diffusion-enhanced (KDD) semantic non-orthogonal multiple access (NOMA), named KDD-SemNOMA, for multi-user uplink wireless image transmission. Specifically, to ensure robust feature transmission across diverse transmission conditions, we firstly develop a ConvNeXt-based deep joint source and channel coding architecture with enhanced adaptive feature module. This module incorporates signal-to-noise ratio and channel state information to dynamically adapt to additive white Gaussian noise and Rayleigh fading channels. Furthermore, to improve image restoration quality without inference overhead, we introduce a two-stage knowledge distillation strategy, i.e., a teacher model, trained on interference-free orthogonal transmission, guides a student model via feature affinity distillation and cross-head prediction distillation. Moreover, a diffusion model-based refinement stage leverages generative priors to transform initial SemNOMA outputs into high-fidelity images with enhanced perceptual quality. Extensive experiments on CIFAR-10 and FFHQ-256 datasets demonstrate superior performance over state-of-the-art methods, delivering satisfactory reconstruction performance even at extremely poor channel conditions. These results highlight the advantages in both pixel-level accuracy and perceptual metrics, effectively mitigating interference and enabling high-quality image recovery.
