Table of Contents
Fetching ...

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Huanyu Wang, Ziyu Xia, Zhuoming Chen, Beidi Chen

Abstract

Large language model (LLM) services are mostly centralized, leading to scalability bottlenecks and underutilization of substantial scattered GPU resources. While decentralization offers a promising alternative, existing frameworks primarily focus on cooperation among GPU providers while overlooking their inherent competitive dynamics, imposing substantial constraints such as excessive platform-level oversight or rigid requirements to execute all assigned requests using fixed software stacks on fixed hardware configurations. We argue that such assumptions are unrealistic in real-world decentralized environments. To this end, we propose WWW$.$Serve, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to autonomously allocate requests without centralized coordination. Empirically, we show that WWW$.$Serve improves global SLO (service-level-objective) attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while fully preserving the benefits of decentralization. These results highlight WWW$.$Serve as a promising foundation for real-world, decentralized LLM serving.

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Abstract

Large language model (LLM) services are mostly centralized, leading to scalability bottlenecks and underutilization of substantial scattered GPU resources. While decentralization offers a promising alternative, existing frameworks primarily focus on cooperation among GPU providers while overlooking their inherent competitive dynamics, imposing substantial constraints such as excessive platform-level oversight or rigid requirements to execute all assigned requests using fixed software stacks on fixed hardware configurations. We argue that such assumptions are unrealistic in real-world decentralized environments. To this end, we propose WWWServe, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to autonomously allocate requests without centralized coordination. Empirically, we show that WWWServe improves global SLO (service-level-objective) attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while fully preserving the benefits of decentralization. These results highlight WWWServe as a promising foundation for real-world, decentralized LLM serving.
Paper Structure (23 sections, 4 theorems, 15 equations, 10 figures, 3 tables)

This paper contains 23 sections, 4 theorems, 15 equations, 10 figures, 3 tables.

Key Result

Lemma 5.5

Under Assumptions asm:node--asm:duel, the expected payoff of node $i$ from serving a single delegated request is Consequently, the expected payoff rate of node $i$ under delegated request arrival rate $\lambda$ and PoS selection probability $p_i(t)$ is

Figures (10)

  • Figure 1: WWW.Serve operates as an intermediate decentralized serving layer between users and LLM service providers, offering users access to an open and competitive market of worldwide LLM services while preserving service providers’ anonymity and flexibility. Within WWW.Serve, inference requests follow a collaborative workflow that performs decentralized routing, execution, and quality-aware evaluation.
  • Figure 2: Internal architecture of a single node. Each node within WWW.Serve is organized around five core managers: Request, Policy, Ledger, Model, and Communication, which together enable PoS-based request routing, policy-driven delegation, and efficient execution over heterogeneous LLM backends.
  • Figure 3: Duel-and-judge mechanism.
  • Figure 4: Comparison of global SLO attainment across single-node, centralized, and decentralized (WWW.Serve) deployments under four different experimental settings (Settings 1-4 from left to right; see Appendix \ref{['app:setting']} for details).
  • Figure 5: Request latency under dynamic participation. Blue lines indicate node join/leave events; black lines show the windowed average latency.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Lemma 5.5: Expected node payoff
  • proof
  • Proposition 5.6: Single-node stake-share dynamics
  • proof
  • Proposition 5.7: Group-level stake-share dynamics
  • proof
  • Theorem 5.8: High-quality equilibrium
  • proof