Characterizing Network Requirements for GPU API Remoting in AI Applications
Tianxia Wang, Zhuofu Chen, Xingda Wei, Jinyu Gu, Rong Chen, Haibo Chen
TL;DR
This work tackles the problem of sizing network resources for GPU API remoting in AI workloads, aiming to keep remoting overhead within a budget $\varepsilon$. It introduces a GPU-centric design and a formal cost model, underpinned by two optimization principles: asynchronous outstanding requests (OR) and shadow descriptors (SR), which together convert many sync APIs to async and overlap CPU/GPU execution. The authors validate their approach through emulation and real RDMA-enabled hardware, deriving network requirements via Cost(APP) ≤ $\varepsilon$ and demonstrating that latency in the range $5$–$20\,\mu$s with hundreds of Gbps bandwidth suffices for many models, with overhead often below 5% and some workloads even improving. They also provide an open-source remoting system and analytical tools, offering practical guidance for data-center network provisioning and enabling efficient AI remoting on commodity networks.
Abstract
GPU remoting is a promising technique for supporting AI applications. Networking plays a key role in enabling remoting. However, for efficient remoting, the network requirements in terms of latency and bandwidth are unknown. In this paper, we take a GPU-centric approach to derive the minimum latency and bandwidth requirements for GPU remoting, while ensuring no (or little) performance degradation for AI applications. Our study including theoretical model demonstrates that, with careful remoting design, unmodified AI applications can run on the remoting setup using commodity networking hardware without any overhead or even with better performance, with low network demands.
