Table of Contents
Fetching ...

Agora: Bridging the GPU Cloud Resource-Price Disconnect

Ian McDougall, Noah Scott, Joon Huh, Kirthevasan Kandasamy, Karthikeyan Sankaralingam

TL;DR

Addressing the misalignment between GPU compute growth and memory bandwidth economics, the paper proposes feature-based pricing and the Agora architecture to align price with actual resource usage, particularly bandwidth. It provides an economic and algorithmic definition of feature-based pricing and a secure system design for auditable billing. Empirical evaluation across hundreds of GPU applications and three generations (A100, H100, Blackwell) demonstrates that high-frequency sampling, e.g., $50\mu s$, yields near-ideal pricing with about 5% revenue loss, while $10\mu s$ achieves 2.4% loss. The results indicate a practical, transparent GPU cloud market that improves resource allocation for bandwidth-bound workloads such as LLM inference.

Abstract

The historic trend of Moore's Law, which predicted exponential growth in computational performance per dollar, has diverged for modern Graphics Processing Units (GPUs). While Floating Point Operations per Second (FLOPs) capabilities have continued to scale economically, memory bandwidth has not, creating a significant price-performance disconnect. This paper argues that the prevailing time-based pricing models for cloud GPUs are economically inefficient for bandwidth-bound workloads. These models fail to account for the rising marginal cost of memory bandwidth, leading to market distortions and suboptimal hardware allocation. To address this, we propose a novel feature-based pricing framework that directly links cost to resource consumption, including but not limited to memory bandwidth. We provide a robust economic and algorithmic definition of this framework and introduce Agora, a practical and secure system architecture for its implementation. Our implementation of Agora shows that a 50us sampling provides nearly perfect pricing as what ideal sampling would provide - losing only 5\% of revenue. 10us sampling is even better result in 2.4\% loss. Modern telemetry systems can already provide this rate of measurement, and our prototype implementation shows the system design for feature-based pricing is buildable. Our evaluation across diverse GPU applications and hardware generations empirically validates the effectiveness of our approach in creating a more transparent and efficient market for cloud GPU resources.

Agora: Bridging the GPU Cloud Resource-Price Disconnect

TL;DR

Addressing the misalignment between GPU compute growth and memory bandwidth economics, the paper proposes feature-based pricing and the Agora architecture to align price with actual resource usage, particularly bandwidth. It provides an economic and algorithmic definition of feature-based pricing and a secure system design for auditable billing. Empirical evaluation across hundreds of GPU applications and three generations (A100, H100, Blackwell) demonstrates that high-frequency sampling, e.g., , yields near-ideal pricing with about 5% revenue loss, while achieves 2.4% loss. The results indicate a practical, transparent GPU cloud market that improves resource allocation for bandwidth-bound workloads such as LLM inference.

Abstract

The historic trend of Moore's Law, which predicted exponential growth in computational performance per dollar, has diverged for modern Graphics Processing Units (GPUs). While Floating Point Operations per Second (FLOPs) capabilities have continued to scale economically, memory bandwidth has not, creating a significant price-performance disconnect. This paper argues that the prevailing time-based pricing models for cloud GPUs are economically inefficient for bandwidth-bound workloads. These models fail to account for the rising marginal cost of memory bandwidth, leading to market distortions and suboptimal hardware allocation. To address this, we propose a novel feature-based pricing framework that directly links cost to resource consumption, including but not limited to memory bandwidth. We provide a robust economic and algorithmic definition of this framework and introduce Agora, a practical and secure system architecture for its implementation. Our implementation of Agora shows that a 50us sampling provides nearly perfect pricing as what ideal sampling would provide - losing only 5\% of revenue. 10us sampling is even better result in 2.4\% loss. Modern telemetry systems can already provide this rate of measurement, and our prototype implementation shows the system design for feature-based pricing is buildable. Our evaluation across diverse GPU applications and hardware generations empirically validates the effectiveness of our approach in creating a more transparent and efficient market for cloud GPU resources.

Paper Structure

This paper contains 6 sections, 2 tables.