Table of Contents
Fetching ...

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang

TL;DR

This survey analyzes the convergence of mobile edge intelligence (MEI) and large language models (LLMs), arguing that cloud-only deployment is impractical for latency, bandwidth, and privacy in next-gen networks. It introduces MEI4LLM, an AI-native architecture that enables collaboration across edge devices and edge servers for caching, training, and inference of LLMs, while leveraging parameter sharing and PEFT techniques to cut storage and communication costs. The paper details concrete mechanisms for edge caching/delivery, training (centralized, federated, split, hierarchical), and inference (centralized, split, collaborative), and discusses challenges and opportunities in green energy, security, and quality-aware learning. By outlining a comprehensive MEI4LLM blueprint and reviewing state-of-the-art techniques, the work provides a practical roadmap for deploying privacy-preserving, low-latency LLM services at the network edge. It emphasizes the need for joint optimization of communication and computation, model partitioning, and edge collaboration to unlock large-scale, edge-resident LLM capabilities in 6G networks.

Abstract

On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm. Nonetheless, the performance of on-device LLMs is intrinsically constrained by resource limitations on edge devices. Sitting between cloud and on-device AI, mobile edge intelligence (MEI) presents a viable solution by provisioning AI capabilities at the edge of mobile networks, enabling end users to offload heavy AI computation to capable edge servers nearby. This article provides a contemporary survey on harnessing MEI for LLMs. We begin by illustrating several killer applications to demonstrate the urgent need for deploying LLMs at the network edge. Next, we present the preliminaries of LLMs and MEI, followed by resource-efficient LLM techniques. We then present an architectural overview of MEI for LLMs (MEI4LLM), outlining its core components and how it supports the deployment of LLMs. Subsequently, we delve into various aspects of MEI4LLM, extensively covering edge LLM caching and delivery, edge LLM training, and edge LLM inference. Finally, we identify future research opportunities. We hope this article inspires researchers in the field to leverage mobile edge computing to facilitate LLM deployment, thereby unleashing the potential of LLMs across various privacy- and delay-sensitive applications.

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

TL;DR

This survey analyzes the convergence of mobile edge intelligence (MEI) and large language models (LLMs), arguing that cloud-only deployment is impractical for latency, bandwidth, and privacy in next-gen networks. It introduces MEI4LLM, an AI-native architecture that enables collaboration across edge devices and edge servers for caching, training, and inference of LLMs, while leveraging parameter sharing and PEFT techniques to cut storage and communication costs. The paper details concrete mechanisms for edge caching/delivery, training (centralized, federated, split, hierarchical), and inference (centralized, split, collaborative), and discusses challenges and opportunities in green energy, security, and quality-aware learning. By outlining a comprehensive MEI4LLM blueprint and reviewing state-of-the-art techniques, the work provides a practical roadmap for deploying privacy-preserving, low-latency LLM services at the network edge. It emphasizes the need for joint optimization of communication and computation, model partitioning, and edge collaboration to unlock large-scale, edge-resident LLM capabilities in 6G networks.

Abstract

On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm. Nonetheless, the performance of on-device LLMs is intrinsically constrained by resource limitations on edge devices. Sitting between cloud and on-device AI, mobile edge intelligence (MEI) presents a viable solution by provisioning AI capabilities at the edge of mobile networks, enabling end users to offload heavy AI computation to capable edge servers nearby. This article provides a contemporary survey on harnessing MEI for LLMs. We begin by illustrating several killer applications to demonstrate the urgent need for deploying LLMs at the network edge. Next, we present the preliminaries of LLMs and MEI, followed by resource-efficient LLM techniques. We then present an architectural overview of MEI for LLMs (MEI4LLM), outlining its core components and how it supports the deployment of LLMs. Subsequently, we delve into various aspects of MEI4LLM, extensively covering edge LLM caching and delivery, edge LLM training, and edge LLM inference. Finally, we identify future research opportunities. We hope this article inspires researchers in the field to leverage mobile edge computing to facilitate LLM deployment, thereby unleashing the potential of LLMs across various privacy- and delay-sensitive applications.
Paper Structure (73 sections, 17 figures, 7 tables)

This paper contains 73 sections, 17 figures, 7 tables.

Figures (17)

  • Figure 1: The outline of this survey.
  • Figure 2: Killer LLM-empowered applications demonstrating the need for deploying LLMs at the network edge and the corresponding latency and bandwidth requirements on practical cases.
  • Figure 3: The Transformer architecture, adapted from vaswani2017attention.
  • Figure 4: The decoder-only LLM architecture, which is adopted by GPT models radford2018improving.
  • Figure 5: The structure of multimodal LLM.
  • ...and 12 more figures