Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Yuxuan Chen; Rongpeng Li; Xiaoxue Yu; Zhifeng Zhao; Honggang Zhang

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, Honggang Zhang

TL;DR

This study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE), incorporating a reward surrogate model that significantly reduces the computational cost of frequent performance evaluations.

Abstract

Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

TL;DR

Abstract

Paper Structure (16 sections, 16 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 16 equations, 9 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Edge-Enhanced LLM Deployment
Split Inference in Distributed Computing
Reinforcement Learning in Network Optimization
System Model and Problem Formulation
System Model
Problem Formulation
Reinforcement Learning for Splitting Point Optimization
The Markov Decision Process
Proximal Policy Optimization
The Reward Surrogate Model for Faster RL
Simulation Settings and Experimental Results
Experimental Setup
Experiment Results
...and 1 more sections

Figures (9)

Figure 1: A high-level architecture of the framework, depicting the distribution of the LLM across edge and UEs, highlights the role of the RL agent in managing interactions between the LLM and wireless networks.
Figure 2: Overview of the split model architecture in wireless channel, with layer 3 designated as the example splitting point. We use the 32-layer LLaMA2-7B model as an example.
Figure 3: Illustrations of the impact on PPL across different layers for various LLMs under (a) high SNR and (b) low SNR in AWGN; (c) low packet loss probability and (d) high packet loss probability under Nakagami-$m$ fading.
Figure 4: Illustrations of the RL setup, including the LLM, RL agent, and channel noise modules. The RL agent optimizes the splitting point of the LLM by receiving state inputs (noise intensity, Nakagami-$m$ fading shape and splitting point), computing action probabilities via the policy network, and updating the policy based on the reward function.
Figure 5: Comparison of training performances for different RL approaches under Case L, Case H, and Case A.
...and 4 more figures

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

TL;DR

Abstract

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (9)