Table of Contents
Fetching ...

Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning

Yuxuan Chen, Rongpeng Li, Xianfu Chen, Celimuge Wu, Chenghui Peng, Zhifeng Zhao, Honggang Zhang

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of balancing quality and cost for LLM-based agents deployed at the network edge by introducing a cloud-edge NetGPT framework that couples network-aware routing with on-device self-improvement. It proposes a unified score-threshold routing policy and online RL anchored by a SFT prior to preserve JSON tool-calling schemas, plus two practical instantiations (FuncDyn and PolicyNet) for dynamic fallback thresholds. The authors provide theoretical guarantees for a unique, state-dependent optimal threshold and monotone effects of bandwidth and RTT, and validate the approach with experiments showing smooth quality-cost frontiers and reduced cloud offloading under varying network conditions. The work demonstrates that joint optimization of routing and edge-model adaptation yields robust performance improvements in dynamic network environments, with clear implications for real-world cloud-edge deployments.>

Abstract

Large language model (LLM) agents at the network edge offer low-latency execution for routine queries. In contrast, complex requests often require the superior capability of cloud models, incurring higher latency and cost. To navigate this quality-cost trade-off under dynamic network conditions, we propose a cloud-edge synergy for NetGPT that integrates network-aware routing with on-edge self-improvement. Specifically, our framework routes structured tool-calling requests to cloud or edge agents via a novel scoring policy. We prove that, under mild regularity assumptions, the optimal routing rule admits a unique fallback threshold with monotone dependence on bandwidth and round-trip time (RTT). Concurrently, based on the dataset collected from requests routed to the cloud and corresponding responses, we instantiate a schema-preserving reinforcement learning (RL) to improve the capability of the edge agent. We analyze a supervised finetuning (SFT)-anchored composite objective that combines a reverse-KL trust-region step with a forward-KL realignment toward the SFT prior, explaining stability and constraining policy drift. Both the network-aware routing policy and the edge agent are updated coherently. Experiments across controlled network states and pricing schedules demonstrate smooth quality-cost frontiers, consistent gains of dynamic fallback thresholds over fixed policies, and sustained reductions in offloading while maintaining task success and schema-correct outputs.

Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of balancing quality and cost for LLM-based agents deployed at the network edge by introducing a cloud-edge NetGPT framework that couples network-aware routing with on-device self-improvement. It proposes a unified score-threshold routing policy and online RL anchored by a SFT prior to preserve JSON tool-calling schemas, plus two practical instantiations (FuncDyn and PolicyNet) for dynamic fallback thresholds. The authors provide theoretical guarantees for a unique, state-dependent optimal threshold and monotone effects of bandwidth and RTT, and validate the approach with experiments showing smooth quality-cost frontiers and reduced cloud offloading under varying network conditions. The work demonstrates that joint optimization of routing and edge-model adaptation yields robust performance improvements in dynamic network environments, with clear implications for real-world cloud-edge deployments.>

Abstract

Large language model (LLM) agents at the network edge offer low-latency execution for routine queries. In contrast, complex requests often require the superior capability of cloud models, incurring higher latency and cost. To navigate this quality-cost trade-off under dynamic network conditions, we propose a cloud-edge synergy for NetGPT that integrates network-aware routing with on-edge self-improvement. Specifically, our framework routes structured tool-calling requests to cloud or edge agents via a novel scoring policy. We prove that, under mild regularity assumptions, the optimal routing rule admits a unique fallback threshold with monotone dependence on bandwidth and round-trip time (RTT). Concurrently, based on the dataset collected from requests routed to the cloud and corresponding responses, we instantiate a schema-preserving reinforcement learning (RL) to improve the capability of the edge agent. We analyze a supervised finetuning (SFT)-anchored composite objective that combines a reverse-KL trust-region step with a forward-KL realignment toward the SFT prior, explaining stability and constraining policy drift. Both the network-aware routing policy and the edge agent are updated coherently. Experiments across controlled network states and pricing schedules demonstrate smooth quality-cost frontiers, consistent gains of dynamic fallback thresholds over fixed policies, and sustained reductions in offloading while maintaining task success and schema-correct outputs.

Paper Structure

This paper contains 20 sections, 4 theorems, 41 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Under Assumption ass:regularity,

Figures (11)

  • Figure 1: Overview of the proposed cloud-edge pipeline.
  • Figure 2: Online-stage of the proposed pipeline.
  • Figure 4: Example of structured text flow across stages.
  • Figure 5: Comparison among All Edge, All Cloud, RouteLLM ong-etal-2025-routellm, FrugalGPT chen-etal-2024-frugalgpt, and the dynamic controllers (FuncDyn, PolicyNet) under GOOD/MID/BAD links.
  • Figure 6: Dynamic vs. fixed fallback threshold under time-varying links.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Lemma 1: Frontier sensitivities
  • proof
  • Theorem 1: Unique optimal fallback threshold (first-order balance)
  • proof
  • Remark 1
  • Theorem 2: Network influence on $\tau^*$
  • proof
  • Remark 2
  • Corollary 1: Local sensitivity of $\tau^*(S)$
  • proof