Table of Contents
Fetching ...

Fusion of Mixture of Experts and Generative Artificial Intelligence in Mobile Edge Metaverse

Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Shiwen Mao, Dong In Kim

TL;DR

The paper addresses the computational and coherence challenges of content creation in the mobile edge Metaverse by fusing Mixture of Experts (MoE) with Generative AI (GAI). It develops a framework where MoE selective activation governs Discriminative and Generative AI tasks, enabling scalable, edge-friendly video generation through task decomposition and cross-device collaboration. The approach integrates MMVAE and MoE-Fusion concepts to enhance multimodal content generation, validated via case studies using the VBench metric suite. Key findings show improved video quality and consistency when tasks are properly decomposed, while also highlighting training complexity and bandwidth as notable challenges for deployment at scale.

Abstract

In the digital transformation era, Metaverse offers a fusion of virtual reality (VR), augmented reality (AR), and web technologies to create immersive digital experiences. However, the evolution of the Metaverse is slowed down by the challenges of content creation, scalability, and dynamic user interaction. Our study investigates an integration of Mixture of Experts (MoE) models with Generative Artificial Intelligence (GAI) for mobile edge computing to revolutionize content creation and interaction in the Metaverse. Specifically, we harness an MoE model's ability to efficiently manage complex data and complex tasks by dynamically selecting the most relevant experts running various sub-models to enhance the capabilities of GAI. We then present a novel framework that improves video content generation quality and consistency, and demonstrate its application through case studies. Our findings underscore the efficacy of MoE and GAI integration to redefine virtual experiences by offering a scalable, efficient pathway to harvest the Metaverse's full potential.

Fusion of Mixture of Experts and Generative Artificial Intelligence in Mobile Edge Metaverse

TL;DR

The paper addresses the computational and coherence challenges of content creation in the mobile edge Metaverse by fusing Mixture of Experts (MoE) with Generative AI (GAI). It develops a framework where MoE selective activation governs Discriminative and Generative AI tasks, enabling scalable, edge-friendly video generation through task decomposition and cross-device collaboration. The approach integrates MMVAE and MoE-Fusion concepts to enhance multimodal content generation, validated via case studies using the VBench metric suite. Key findings show improved video quality and consistency when tasks are properly decomposed, while also highlighting training complexity and bandwidth as notable challenges for deployment at scale.

Abstract

In the digital transformation era, Metaverse offers a fusion of virtual reality (VR), augmented reality (AR), and web technologies to create immersive digital experiences. However, the evolution of the Metaverse is slowed down by the challenges of content creation, scalability, and dynamic user interaction. Our study investigates an integration of Mixture of Experts (MoE) models with Generative Artificial Intelligence (GAI) for mobile edge computing to revolutionize content creation and interaction in the Metaverse. Specifically, we harness an MoE model's ability to efficiently manage complex data and complex tasks by dynamically selecting the most relevant experts running various sub-models to enhance the capabilities of GAI. We then present a novel framework that improves video content generation quality and consistency, and demonstrate its application through case studies. Our findings underscore the efficacy of MoE and GAI integration to redefine virtual experiences by offering a scalable, efficient pathway to harvest the Metaverse's full potential.
Paper Structure (15 sections, 4 figures)

This paper contains 15 sections, 4 figures.

Figures (4)

  • Figure 1: The mobile edge Metaverse architecture includes three layers: The Physical layers where the sensor can collect information from the real world. The Edge layer then uses the collected data to generate different assets, such as images, video, and audio, to alter existing content to form services. The Services layer forms all assets together for a completed and customized service experience.
  • Figure 2: Applications of MoE in GAI for the Metaverse. For example, the authors in 9980170 proposed a multi-gate MoE framework that employs a GAN architecture, where the generator is a speech enhancement network and the discriminator is a speech quality assessment network. This discriminator predicts quality metrics that guide the speech enhancement network to improve the quality of speech by minimizing the discrepancy between enhanced speech and clean speech.
  • Figure 3: The mobile edge mixture of video generation framework provides a comprehensive approach to creating video content for the Metaverse. It leverages LLM for task decomposition and expert edge devices for video generation. The framework efficiently handles both temporal and spatial video generation tasks. This strategic division and processing ensure that video content is dynamically generated and merged, emphasizing the collaborative, resource-efficient potential of MoE in creating immersive virtual experiences.
  • Figure 5: Results comparison across videos with different merging strategies (temporal and spatial), as well as video generated using one device without the proposed MOE framework.