Table of Contents
Fetching ...

Toward Democratized Generative AI in Next-Generation Mobile Edge Networks

Ruichen Zhang, Jiayi He, Xiaofeng Luo, Dusit Niyato, Jiawen Kang, Zehui Xiong, Yonghui Li, Biplab Sikdar

TL;DR

A model-centric framework for democratizing generative AI deployment on mobile and edge networks is proposed and Experimental results highlight the practicality of democratized LLMs, with significant improvements in generalization accuracy, hallucination rate, accessibility, and resource consumption.

Abstract

The rapid development of generative AI technologies, including large language models (LLMs), has brought transformative changes to various fields. However, deploying such advanced models on mobile and edge devices remains challenging due to their high computational, memory, communication, and energy requirements. To address these challenges, we propose a model-centric framework for democratizing generative AI deployment on mobile and edge networks. First, we comprehensively review key compact model strategies, such as quantization, model pruning, and knowledge distillation, and present key performance metrics to optimize generative AI for mobile deployment. Next, we provide a focused review of mobile and edge networks, emphasizing the specific challenges and requirements of these environments. We further conduct a case study demonstrating the effectiveness of these strategies by deploying LLMs on real mobile edge devices. Experimental results highlight the practicality of democratized LLMs, with significant improvements in generalization accuracy, hallucination rate, accessibility, and resource consumption. Finally, we discuss potential research directions to further advance the deployment of generative AI in resource-constrained environments.

Toward Democratized Generative AI in Next-Generation Mobile Edge Networks

TL;DR

A model-centric framework for democratizing generative AI deployment on mobile and edge networks is proposed and Experimental results highlight the practicality of democratized LLMs, with significant improvements in generalization accuracy, hallucination rate, accessibility, and resource consumption.

Abstract

The rapid development of generative AI technologies, including large language models (LLMs), has brought transformative changes to various fields. However, deploying such advanced models on mobile and edge devices remains challenging due to their high computational, memory, communication, and energy requirements. To address these challenges, we propose a model-centric framework for democratizing generative AI deployment on mobile and edge networks. First, we comprehensively review key compact model strategies, such as quantization, model pruning, and knowledge distillation, and present key performance metrics to optimize generative AI for mobile deployment. Next, we provide a focused review of mobile and edge networks, emphasizing the specific challenges and requirements of these environments. We further conduct a case study demonstrating the effectiveness of these strategies by deploying LLMs on real mobile edge devices. Experimental results highlight the practicality of democratized LLMs, with significant improvements in generalization accuracy, hallucination rate, accessibility, and resource consumption. Finally, we discuss potential research directions to further advance the deployment of generative AI in resource-constrained environments.

Paper Structure

This paper contains 30 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of democratized generative AI. (A) Key compact model strategies, including fine-tuning, model pruning, distillation, quantization, mixture of experts, and caching. (B) 4-dimensional evaluation for democratized generative AI highlighting metrics including energy efficiency, hallucination rate, generalization accuracy, and accessibility. (C) Table showing the impact of each strategy on the metrics, with arrows indicating whether the effect is incremental, descending, or basically constant.
  • Figure 2: Review of representative research works on key compact model in democratized generative AI. From the tree diagram, we can find that in the past two years, many research works have emerged to improve and optimize large generative AI models such as LLMs, which will further promote the progress and realization of the democratized generative AI paradigm in the next-generation mobile edge networks.
  • Figure 3: Workflow for deploying LLMs on mobile edge devices for real-time Q&A tasks. Key steps include (A) Model Optimization using techniques such as pruning and quantization; (B) Model Compilation to optimize models for hardware compatibility; (C) Deployment and Performance Evaluation with metrics such as accuracy, hallucination rate, and resource usage; (D) Offloading Strategy, where simple tasks are handled locally and complex tasks are offloaded to an edge server, balancing efficiency and scalability.
  • Figure 4: Experimental results of key compact model strategies on the metrics. This test is conducted on a server with Intel Xeon Platinum 8380 CPU and NVIDIA A100 80GB GPU. Resource consumption is the average power consumption of the GPU to respond to a single query with up to 256 tokens.
  • Figure 5: Numerical results of running the democratized LLMs. The mobile edge device used is a Xiaomi 10 Ultra smartphone with 12GB of RAM, 256GB of storage, and a Qualcomm Snapdragon 865 processor. The edge server is with Intel Xeon Platinum 8380 CPU and NVIDIA A100 80GB GPU. The Resource consumption of a mobile edge device is the average battery consumption to respond to a single query, while the edge server resource consumption is the average power consumption of the GPU to respond to a single query with up to 256 tokens.