Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing

Bin Chen; Wenbo Yu; Qinshan Zhang; Tianqu Zhuang; Yong Jiang; Shu-Tao Xia

Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing

Bin Chen, Wenbo Yu, Qinshan Zhang, Tianqu Zhuang, Yong Jiang, Shu-Tao Xia

TL;DR

Editable-DeepSC addresses the challenge of real-time cross-modal facial editing over noisy channels by integrating editing operations into the semantic communication pipeline. It combines GAN-inversion-based semantic coding with Joint Editing-Channel Coding and lightweight SNR-aware adapters to transmit only task-relevant facial semantics while enabling precise, user-guided edits. The method achieves superior editing fidelity and semantic preservation while dramatically reducing the Channel Bandwidth Ratio ($\rho$) compared to baselines, including under high-resolution ($1024\times1024$) and Out-Of-Distribution (OOD) settings. This approach enables efficient, interactive, language-guided facial editing over wireless links, with practical implications for real-time social-media and metaverse applications.

Abstract

Real-time computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks. Nonetheless, the data-oriented characteristics of conventional communications often do not align with the special needs of real-time CV tasks. To alleviate this issue, the recently emerged semantic communications only transmit task-related semantic information and exhibit a promising landscape to address this problem. However, the communication challenges associated with Semantic Facial Editing, one of the most important real-time CV applications on social media, still remain largely unexplored. In this paper, we fill this gap by proposing Editable-DeepSC, a novel cross-modal semantic communication approach for facial editing. Firstly, we theoretically discuss different transmission schemes that separately handle communications and editings, and emphasize the necessity of Joint Editing-Channel Coding (JECC) via iterative attributes matching, which integrates editings into the communication chain to preserve more semantic mutual information. To compactly represent the high-dimensional data, we leverage inversion methods via pre-trained StyleGAN priors for semantic coding. To tackle the dynamic channel noise conditions, we propose SNR-aware channel coding via model fine-tuning. Extensive experiments indicate that Editable-DeepSC can achieve superior editings while significantly saving the transmission bandwidth, even under high-resolution and out-of-distribution (OOD) settings.

Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing

TL;DR

Abstract

Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (1)