Language-Oriented Semantic Latent Representation for Image Transmission
Giordano Cicchetti, Eleonora Grassucci, Jihong Park, Jinho Choi, Sergio Barbarossa, Danilo Comminiello
TL;DR
This paper tackles the problem that language-only semantic communication (I2T) can miss fine visual details. It proposes a framework that simultaneously transmits a textual caption $y$ and a compact latent image embedding $z$, using a latent diffusion model conditioned on both to reconstruct the image at the receiver. The approach achieves substantial bandwidth savings (payload about 2.09% of the original image) while improving perceptual fidelity over text-only baselines, especially in moderate-to-high SNR scenarios. This work enables adaptive, bandwidth-efficient image transmission and suggests avenues for extending semantic communication to other media and more compact semantic representations.
Abstract
In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .
