Table of Contents
Fetching ...

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

TL;DR

The paper investigates leveraging generative AI and semantic communication to enable immersive, multisensory experiences (IoS) over 6G networks. It argues that multimodal LLMs can compress and convey meaning rather than raw signals, achieving substantial bandwidth reductions while highlighting latency challenges. A UAV-based case study demonstrates a two-LLM workflow that converts image-derived semantics into WebXR content and mulsemedia cues, yielding about a 99.93% bandwidth saving but increased end-to-end latency. The work proposes an edge-to-cloud architecture with a dual-LLM setup and discusses open challenges including multi-user scalability, latency, edge constraints, energy usage, and interoperability. It provides a concrete testbed, a semantic streaming framework, and quantitative benchmarks comparing semantic versus conventional immersive media pipelines.

Abstract

Over the past two decades, the Internet-of-Things (IoT) has become a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores the existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that leverages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media. Concurrently addressing major challenges in this field, such as temporal synchronization of multiple media, ensuring high throughput, minimizing the End-to-End (E2E) latency, and robustness to low bandwidth while outlining future trajectories.

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

TL;DR

The paper investigates leveraging generative AI and semantic communication to enable immersive, multisensory experiences (IoS) over 6G networks. It argues that multimodal LLMs can compress and convey meaning rather than raw signals, achieving substantial bandwidth reductions while highlighting latency challenges. A UAV-based case study demonstrates a two-LLM workflow that converts image-derived semantics into WebXR content and mulsemedia cues, yielding about a 99.93% bandwidth saving but increased end-to-end latency. The work proposes an edge-to-cloud architecture with a dual-LLM setup and discusses open challenges including multi-user scalability, latency, edge constraints, energy usage, and interoperability. It provides a concrete testbed, a semantic streaming framework, and quantitative benchmarks comparing semantic versus conventional immersive media pipelines.

Abstract

Over the past two decades, the Internet-of-Things (IoT) has become a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores the existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that leverages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media. Concurrently addressing major challenges in this field, such as temporal synchronization of multiple media, ensuring high throughput, minimizing the End-to-End (E2E) latency, and robustness to low bandwidth while outlining future trajectories.
Paper Structure (25 sections, 2 equations, 8 figures, 3 tables)

This paper contains 25 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Key concepts of IoS
  • Figure 2: Architecture of a conventional video streaming system
  • Figure 3: Proposed architecture for genai enabled immersive communication
  • Figure 4: Network latency between components in the architecture
  • Figure 5: Generated 3d view against real captured image
  • ...and 3 more figures