Table of Contents
Fetching ...

Editable-DeepSC: Cross-Modal Editable Semantic Communication Systems

Wenbo Yu, Bin Chen, Qinshan Zhang, Shu-Tao Xia

TL;DR

This paper proposes a novel cross-modal editable semantic communication system, named Editable-DeepSC, which takes cross-modal text-image pairs as the inputs and transmits the edited information of images based on textual instructions using inversion methods based on StyleGAN priors.

Abstract

Different from data-oriented communication systems that primarily focus on how to accurately transmit every bit of data, task-oriented semantic communication systems only transmit the specific semantic information required by downstream tasks, strive to minimize the communication overhead and maintain competitive tasks execution performance in the presence of channel noise. However, it is worth noting that in many scenarios, the transmitted semantic information needs to be dynamically modified according to the users' preferences in a conversational and interactive way, which few existing works take into consideration. In this paper, we propose a novel cross-modal editable semantic communication system, named Editable-DeepSC, to tackle this challenge. By utilizing inversion methods based on StyleGAN priors, Editable-DeepSC takes cross-modal text-image pairs as the inputs and transmits the edited information of images based on textual instructions. Extensive numerical results demonstrate that our proposed Editable-DeepSC can achieve remarkable editing effects and transmission efficiency under the perturbations of channel noise, outperforming existing data-oriented communication methods.

Editable-DeepSC: Cross-Modal Editable Semantic Communication Systems

TL;DR

This paper proposes a novel cross-modal editable semantic communication system, named Editable-DeepSC, which takes cross-modal text-image pairs as the inputs and transmits the edited information of images based on textual instructions using inversion methods based on StyleGAN priors.

Abstract

Different from data-oriented communication systems that primarily focus on how to accurately transmit every bit of data, task-oriented semantic communication systems only transmit the specific semantic information required by downstream tasks, strive to minimize the communication overhead and maintain competitive tasks execution performance in the presence of channel noise. However, it is worth noting that in many scenarios, the transmitted semantic information needs to be dynamically modified according to the users' preferences in a conversational and interactive way, which few existing works take into consideration. In this paper, we propose a novel cross-modal editable semantic communication system, named Editable-DeepSC, to tackle this challenge. By utilizing inversion methods based on StyleGAN priors, Editable-DeepSC takes cross-modal text-image pairs as the inputs and transmits the edited information of images based on textual instructions. Extensive numerical results demonstrate that our proposed Editable-DeepSC can achieve remarkable editing effects and transmission efficiency under the perturbations of channel noise, outperforming existing data-oriented communication methods.
Paper Structure (11 sections, 13 equations, 4 figures, 1 table)

This paper contains 11 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The overall architecture of our proposed Editable-DeepSC. Our model mainly consists of the cross-modal codecs and the Semantic Editing Module.
  • Figure 2: The working procedure of the Semantic Editing Module in our proposed Editable-DeepSC. The expected degree of the attribute, e.g., the length of the bangs, will be computed according to the given encodings $E_{I}$ and $E_{T}$. We model the Semantic Field Function to exert minor modifications on $E_{I}$. By comparing whether the predicted attribute matches with the target attribute after each refinement on $E_{I}$, fine-grained editings can ultimately be realized.
  • Figure 3: Quantitative comparison of different methods on cross-modal language-driven editing tasks. Note that $\uparrow$ indicates that the higher the better and $\downarrow$ indicates that the lower the better.
  • Figure 4: Qualitative comparison of different methods on cross-modal language-driven editing tasks ($6$ dB SNR). The original images and the textual instructions are presented in the $1$st row. The results of different methods are displayed in the $2$nd row.