Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning
Yuxuan Chen, Rongpeng Li, Xianfu Chen, Celimuge Wu, Chenghui Peng, Zhifeng Zhao, Honggang Zhang
TL;DR
<3-5 sentence high-level summary> The paper tackles the problem of balancing quality and cost for LLM-based agents deployed at the network edge by introducing a cloud-edge NetGPT framework that couples network-aware routing with on-device self-improvement. It proposes a unified score-threshold routing policy and online RL anchored by a SFT prior to preserve JSON tool-calling schemas, plus two practical instantiations (FuncDyn and PolicyNet) for dynamic fallback thresholds. The authors provide theoretical guarantees for a unique, state-dependent optimal threshold and monotone effects of bandwidth and RTT, and validate the approach with experiments showing smooth quality-cost frontiers and reduced cloud offloading under varying network conditions. The work demonstrates that joint optimization of routing and edge-model adaptation yields robust performance improvements in dynamic network environments, with clear implications for real-world cloud-edge deployments.>
Abstract
Large language model (LLM) agents at the network edge offer low-latency execution for routine queries. In contrast, complex requests often require the superior capability of cloud models, incurring higher latency and cost. To navigate this quality-cost trade-off under dynamic network conditions, we propose a cloud-edge synergy for NetGPT that integrates network-aware routing with on-edge self-improvement. Specifically, our framework routes structured tool-calling requests to cloud or edge agents via a novel scoring policy. We prove that, under mild regularity assumptions, the optimal routing rule admits a unique fallback threshold with monotone dependence on bandwidth and round-trip time (RTT). Concurrently, based on the dataset collected from requests routed to the cloud and corresponding responses, we instantiate a schema-preserving reinforcement learning (RL) to improve the capability of the edge agent. We analyze a supervised finetuning (SFT)-anchored composite objective that combines a reverse-KL trust-region step with a forward-KL realignment toward the SFT prior, explaining stability and constraining policy drift. Both the network-aware routing policy and the edge agent are updated coherently. Experiments across controlled network states and pricing schedules demonstrate smooth quality-cost frontiers, consistent gains of dynamic fallback thresholds over fixed policies, and sustained reductions in offloading while maintaining task success and schema-correct outputs.
