WWW.Serve: Interconnecting Global LLM Services through Decentralization

Huanyu Wang; Ziyu Xia; Zhuoming Chen; Beidi Chen

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Huanyu Wang, Ziyu Xia, Zhuoming Chen, Beidi Chen

Abstract

Large language model (LLM) services are mostly centralized, leading to scalability bottlenecks and underutilization of substantial scattered GPU resources. While decentralization offers a promising alternative, existing frameworks primarily focus on cooperation among GPU providers while overlooking their inherent competitive dynamics, imposing substantial constraints such as excessive platform-level oversight or rigid requirements to execute all assigned requests using fixed software stacks on fixed hardware configurations. We argue that such assumptions are unrealistic in real-world decentralized environments. To this end, we propose WWW$.$Serve, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to autonomously allocate requests without centralized coordination. Empirically, we show that WWW$.$Serve improves global SLO (service-level-objective) attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while fully preserving the benefits of decentralization. These results highlight WWW$.$Serve as a promising foundation for real-world, decentralized LLM serving.

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Abstract

Serve, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to autonomously allocate requests without centralized coordination. Empirically, we show that WWW

Serve improves global SLO (service-level-objective) attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while fully preserving the benefits of decentralization. These results highlight WWW

Serve as a promising foundation for real-world, decentralized LLM serving.

Paper Structure (23 sections, 4 theorems, 15 equations, 10 figures, 3 tables)

This paper contains 23 sections, 4 theorems, 15 equations, 10 figures, 3 tables.

Introduction
Related Work
WWW.Serve's Overview
Network Architecture
Request Routing and Node Design
Core Mechanisms
Credit-based Transaction System
Duel-and-Judge Mechanism
Policy Framework
Game-Theory Analysis
Empirical Evaluation
Scheduling Efficiency
Dynamic Participation
Quality Incentivization
Ablation Study
...and 8 more sections

Key Result

Lemma 5.5

Under Assumptions asm:node--asm:duel, the expected payoff of node $i$ from serving a single delegated request is Consequently, the expected payoff rate of node $i$ under delegated request arrival rate $\lambda$ and PoS selection probability $p_i(t)$ is

Figures (10)

Figure 1: WWW.Serve operates as an intermediate decentralized serving layer between users and LLM service providers, offering users access to an open and competitive market of worldwide LLM services while preserving service providers’ anonymity and flexibility. Within WWW.Serve, inference requests follow a collaborative workflow that performs decentralized routing, execution, and quality-aware evaluation.
Figure 2: Internal architecture of a single node. Each node within WWW.Serve is organized around five core managers: Request, Policy, Ledger, Model, and Communication, which together enable PoS-based request routing, policy-driven delegation, and efficient execution over heterogeneous LLM backends.
Figure 3: Duel-and-judge mechanism.
Figure 4: Comparison of global SLO attainment across single-node, centralized, and decentralized (WWW.Serve) deployments under four different experimental settings (Settings 1-4 from left to right; see Appendix \ref{['app:setting']} for details).
Figure 5: Request latency under dynamic participation. Blue lines indicate node join/leave events; black lines show the windowed average latency.
...and 5 more figures

Theorems & Definitions (8)

Lemma 5.5: Expected node payoff
proof
Proposition 5.6: Single-node stake-share dynamics
proof
Proposition 5.7: Group-level stake-share dynamics
proof
Theorem 5.8: High-quality equilibrium
proof

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Abstract

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (8)