Table of Contents
Fetching ...

ScaleDL: Towards Scalable and Efficient Runtime Prediction for Distributed Deep Learning Workloads

Xiaokai Wang, Shaoyuan Huang, Yuting Li, Xiaofei Wang

TL;DR

ScaleDL tackles the challenge of accurate, scalable runtime prediction for distributed DNN workloads by integrating fine-grained layer-wise models with a Transformer-based GNN to capture cross-layer interactions and all-reduce communication costs. It uses D-optimal sampling to dramatically reduce profiling data while maintaining accuracy, training both per-layer predictors and the graph model on highly informative samples. The approach achieves up to 6× reductions in mean relative error and 5× reductions in RMSE compared with baselines, and shows strong generalization to unseen architectures (OOD). This combination of data efficiency, accuracy, and generalizability makes ScaleDL practical for resource scheduling and deployment optimization in heterogeneous, large-scale DL systems.

Abstract

Deep neural networks (DNNs) form the cornerstone of modern AI services, supporting a wide range of applications, including autonomous driving, chatbots, and recommendation systems. As models increase in size and complexity, DNN workloads such as training and inference tasks impose unprecedented demands on distributed computing resources, making accurate runtime prediction essential for optimizing development and resource allocation. Traditional methods rely on additive computational unit models, limiting their accuracy and generalizability. In contrast, graph-enhanced modeling improves performance but significantly increases data collection costs. Therefore, there is a critical need for a method that strikes a balance between accuracy, generalizability, and data collection costs. To address these challenges, we propose ScaleDL, a novel runtime prediction framework that combines nonlinear layer-wise modeling with graph neural network (GNN)-based cross-layer interaction mechanism, enabling accurate DNN runtime prediction and hierarchical generalizability across different network architectures. Additionally, we employ the D-optimal method to reduce data collection costs. Experiments on the workloads of five popular DNN models demonstrate that ScaleDL enhances runtime prediction accuracy and generalizability, achieving 6 times lower MRE and 5 times lower RMSE compared to baseline models.

ScaleDL: Towards Scalable and Efficient Runtime Prediction for Distributed Deep Learning Workloads

TL;DR

ScaleDL tackles the challenge of accurate, scalable runtime prediction for distributed DNN workloads by integrating fine-grained layer-wise models with a Transformer-based GNN to capture cross-layer interactions and all-reduce communication costs. It uses D-optimal sampling to dramatically reduce profiling data while maintaining accuracy, training both per-layer predictors and the graph model on highly informative samples. The approach achieves up to 6× reductions in mean relative error and 5× reductions in RMSE compared with baselines, and shows strong generalization to unseen architectures (OOD). This combination of data efficiency, accuracy, and generalizability makes ScaleDL practical for resource scheduling and deployment optimization in heterogeneous, large-scale DL systems.

Abstract

Deep neural networks (DNNs) form the cornerstone of modern AI services, supporting a wide range of applications, including autonomous driving, chatbots, and recommendation systems. As models increase in size and complexity, DNN workloads such as training and inference tasks impose unprecedented demands on distributed computing resources, making accurate runtime prediction essential for optimizing development and resource allocation. Traditional methods rely on additive computational unit models, limiting their accuracy and generalizability. In contrast, graph-enhanced modeling improves performance but significantly increases data collection costs. Therefore, there is a critical need for a method that strikes a balance between accuracy, generalizability, and data collection costs. To address these challenges, we propose ScaleDL, a novel runtime prediction framework that combines nonlinear layer-wise modeling with graph neural network (GNN)-based cross-layer interaction mechanism, enabling accurate DNN runtime prediction and hierarchical generalizability across different network architectures. Additionally, we employ the D-optimal method to reduce data collection costs. Experiments on the workloads of five popular DNN models demonstrate that ScaleDL enhances runtime prediction accuracy and generalizability, achieving 6 times lower MRE and 5 times lower RMSE compared to baseline models.

Paper Structure

This paper contains 14 sections, 13 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Objectives of DNN workloads runtime prediction.
  • Figure 2: Overview of ScaleDL.
  • Figure 3: DNN runtime model within ScaleDL for ViT.
  • Figure 4: Accuracy in predicting runtime for the BERT model across different key parameters on a fixed dataset: (a) epoch runtime vs. batch size and (b) epoch runtime vs. sequence length.
  • Figure 5: Average accuracy across different DNN models under OOD settings. Prediction frameworks are trained on all data without the target DNN.
  • ...and 1 more figures