Table of Contents
Fetching ...

LaMI-GO: Latent Mixture Integration for Goal-Oriented Communications Achieving High Spectrum Efficiency

Achintha Wijesinghe, Suchinthaka Wanninayaka, Weiwei Wang, Yu-Chieh Chao, Songyang Zhang, Zhi Ding

TL;DR

LaMI-GO introduces a latent-domain GO-COM framework that combines a shared codebook with latent diffusion (via a text-conditioned Paella backbone) to achieve high spectrum efficiency for goal-oriented tasks. It replaces pixel-domain reconstruction with a latent index-based representation and employs latent mixture integration to recover images without retraining, achieving strong perceptual quality and improved downstream task performance under constrained bandwidth. The approach demonstrates robustness to channel noise and packet loss, outperforming state-of-the-art GO-COM methods in reconstruction quality and bandwidth efficiency, while maintaining practical showtime speeds. This work highlights the potential of latent diffusion with structured masking strategies (PRM, PDM, EBM) for scalable, privacy-preserving, and task-driven communications in future wireless systems.

Abstract

The recent rise of semantic-style communications includes the development of goal-oriented communications (GOCOMs) remarkably efficient multimedia information transmissions. The concept of GO-COMS leverages advanced artificial intelligence (AI) tools to address the rising demand for bandwidth efficiency in applications, such as edge computing and Internet-of-Things (IoT). Unlike traditional communication systems focusing on source data accuracy, GO-COMs provide intelligent message delivery catering to the special needs critical to accomplishing downstream tasks at the receiver. In this work, we present a novel GO-COM framework, namely LaMI-GO that utilizes emerging generative AI for better quality-of-service (QoS) with ultra-high communication efficiency. Specifically, we design our LaMI-GO system backbone based on a latent diffusion model followed by a vector-quantized generative adversarial network (VQGAN) for efficient latent embedding and information representation. The system trains a common feature codebook the receiver side. Our experimental results demonstrate substantial improvement in perceptual quality, accuracy of downstream tasks, and bandwidth consumption over the state-of-the-art GOCOM systems and establish the power of our proposed LaMI-GO communication framework.

LaMI-GO: Latent Mixture Integration for Goal-Oriented Communications Achieving High Spectrum Efficiency

TL;DR

LaMI-GO introduces a latent-domain GO-COM framework that combines a shared codebook with latent diffusion (via a text-conditioned Paella backbone) to achieve high spectrum efficiency for goal-oriented tasks. It replaces pixel-domain reconstruction with a latent index-based representation and employs latent mixture integration to recover images without retraining, achieving strong perceptual quality and improved downstream task performance under constrained bandwidth. The approach demonstrates robustness to channel noise and packet loss, outperforming state-of-the-art GO-COM methods in reconstruction quality and bandwidth efficiency, while maintaining practical showtime speeds. This work highlights the potential of latent diffusion with structured masking strategies (PRM, PDM, EBM) for scalable, privacy-preserving, and task-driven communications in future wireless systems.

Abstract

The recent rise of semantic-style communications includes the development of goal-oriented communications (GOCOMs) remarkably efficient multimedia information transmissions. The concept of GO-COMS leverages advanced artificial intelligence (AI) tools to address the rising demand for bandwidth efficiency in applications, such as edge computing and Internet-of-Things (IoT). Unlike traditional communication systems focusing on source data accuracy, GO-COMs provide intelligent message delivery catering to the special needs critical to accomplishing downstream tasks at the receiver. In this work, we present a novel GO-COM framework, namely LaMI-GO that utilizes emerging generative AI for better quality-of-service (QoS) with ultra-high communication efficiency. Specifically, we design our LaMI-GO system backbone based on a latent diffusion model followed by a vector-quantized generative adversarial network (VQGAN) for efficient latent embedding and information representation. The system trains a common feature codebook the receiver side. Our experimental results demonstrate substantial improvement in perceptual quality, accuracy of downstream tasks, and bandwidth consumption over the state-of-the-art GOCOM systems and establish the power of our proposed LaMI-GO communication framework.

Paper Structure

This paper contains 31 sections, 15 equations, 16 figures, 10 tables, 2 algorithms.

Figures (16)

  • Figure 1: Overall architecture of the proposed LaMI-GO . Transmitter: LaMI-GO has an encoder model to extract essential information of a given image. This information is then quantized using a learned dictionary and masked in the later step. Concurrently, LaMI-GO extracts text information from the same image using a text extractor and compresses it along with the sequence of codeword indices. Receiver: GO, message interpreter, recovers the codeword sequence and the text; Information is regenerated using a latent diffusion model conditioned on text embeddings with the recovered latent vectors as the input. LaMI-GO proposes a latent mixture integration strategy for image recovery.
  • Figure 2: Channel noise model. We consider digital communication with packet transmission.
  • Figure 3: The iterative process of latent mixture integration. Here, $M'$ represent $1_{h' \times w'} - M$.
  • Figure 4: The showtime on the receiver end.
  • Figure 5: Original image.
  • ...and 11 more figures