Table of Contents
Fetching ...

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, Honggang Zhang

TL;DR

This study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE), incorporating a reward surrogate model that significantly reduces the computational cost of frequent performance evaluations.

Abstract

Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

TL;DR

This study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE), incorporating a reward surrogate model that significantly reduces the computational cost of frequent performance evaluations.

Abstract

Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.
Paper Structure (16 sections, 16 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 16 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: A high-level architecture of the framework, depicting the distribution of the LLM across edge and UEs, highlights the role of the RL agent in managing interactions between the LLM and wireless networks.
  • Figure 2: Overview of the split model architecture in wireless channel, with layer 3 designated as the example splitting point. We use the 32-layer LLaMA2-7B model as an example.
  • Figure 3: Illustrations of the impact on PPL across different layers for various LLMs under (a) high SNR and (b) low SNR in AWGN; (c) low packet loss probability and (d) high packet loss probability under Nakagami-$m$ fading.
  • Figure 4: Illustrations of the RL setup, including the LLM, RL agent, and channel noise modules. The RL agent optimizes the splitting point of the LLM by receiving state inputs (noise intensity, Nakagami-$m$ fading shape and splitting point), computing action probabilities via the policy network, and updating the policy based on the reward function.
  • Figure 5: Comparison of training performances for different RL approaches under Case L, Case H, and Case A.
  • ...and 4 more figures