Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation
Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Biqian Feng, Wenjun Zhang, Jihong Park, Tony Quek
TL;DR
This paper presents WVSC-D, a semantic-level wireless video transmission framework that combines deep semantic video coding with a decoupled diffusion multi-frame compensation mechanism. By transmitting a reference semantic I frame and residual semantic P frames, and by polishing P frames at the receiver through GMFC and the decoupled diffusion process (DDMFC), the approach achieves notable bitrate savings and robust performance under wireless channel impairments. Key contributions include the semantic I/P frame scheme, GMFC for generation-based compensation, and the decoupled diffusion architecture that shares base noise across a GoP while generating frame-specific residuals; together they yield improvements over state-of-the-art DL-based and traditional schemes in PSNR and perceptual metrics. The proposed method demonstrates practical impact for low-latency wireless video transmission, edge computing, and IoT scenarios, with potential extension to multi-modal data.
Abstract
Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework with decoupled diffusion multi-frame compensation (DDMFC), abbreviated as WVSC-D, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC-D first encodes original video frames as semantic frames and then conducts video coding based on such compact representations, enabling the video coding in semantic level rather than pixel level. Moreover, to further reduce the communication overhead, a reference semantic frame is introduced to substitute motion vectors of each frame in common video coding methods. At the receiver, DDMFC is proposed to generate compensated current semantic frame by a two-stage conditional diffusion process. With both the reference frame transmission and DDMFC frame compensation, the bandwidth efficiency improves with satisfying video transmission performance. Experimental results verify the performance gain of WVSC-D over other DL-based methods e.g. DVSC about 1.8 dB in terms of PSNR.
