Table of Contents
Fetching ...

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads

Xin Wu, Fei Teng, Xingwang Li, Bin Zheng, Qiang Duan

Abstract

Accurately forecasting GPU workloads is essential for AI infrastructure, enabling efficient scheduling, resource allocation, and power management. Modern workloads are highly volatile, multiple periodicity, and heterogeneous, making them challenging for traditional predictors. We propose PRISM, a primitive-based compositional forecasting framework combining dictionary-driven temporal decomposition with adaptive spectral refinement. This dual representation extracts stable, interpretable workload signatures across diverse GPU jobs. Evaluated on large-scale production traces, PRISM achieves state-of-the-art results. It significantly reduces burst-phase errors, providing a robust, architecture-aware foundation for dynamic resource management in GPU-powered AI platforms.

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads

Abstract

Accurately forecasting GPU workloads is essential for AI infrastructure, enabling efficient scheduling, resource allocation, and power management. Modern workloads are highly volatile, multiple periodicity, and heterogeneous, making them challenging for traditional predictors. We propose PRISM, a primitive-based compositional forecasting framework combining dictionary-driven temporal decomposition with adaptive spectral refinement. This dual representation extracts stable, interpretable workload signatures across diverse GPU jobs. Evaluated on large-scale production traces, PRISM achieves state-of-the-art results. It significantly reduces burst-phase errors, providing a robust, architecture-aware foundation for dynamic resource management in GPU-powered AI platforms.

Paper Structure

This paper contains 11 sections, 15 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Contrast in resource request profiles between two large-scale production clusters. Workload profiles show significant differences. The 2020 cluster profile included a 60.1% CPU-based majority Hu2021SC. In contrast, the 2024 cluster profile is GPU-centric and polarized, centering on single-GPU requests (67.5%) while expanding toward coarse-grained (13.2%) and fine-grained (10.5%) allocations duan2025gfs.
  • Figure 2: A multi-faceted analysis revealing GPU demand volatility, periodicity, and heterogeneity.
  • Figure 3: An overview of the PRISM framework.
  • Figure 4: Main forecasting results of PRISM and baseline models at prediction lengths of 6, 12, 24, and 48.
  • Figure 5: Performance comparison of PRISM against baselines.
  • ...and 1 more figures