PerfSeer: An Efficient and Accurate Deep Learning Models Performance Predictor
Xinlong Zhao, Jiande Sun, Jia Zhang, Sujuan Hou, Shuai Li, Tong Liu, Ke Liu
TL;DR
PerfSeer tackles the challenge of predicting deep learning model performance by encoding models as PerfGraph graphs that capture topology, node, edge, and global features. It introduces SeerNet, a Graph Neural Network that uses SynMM and GNPB to learn from this rich representation, with SeerNet-Multi supporting multi-metric predictions via PCGrad. On a 53k+ configuration RTX 3090 dataset spanning major architectures, PerfSeer achieves a mean MAPE of $MAPE = 5.14\%$ with SeerNet and $MAPE = 7.75\%$ with SeerNet-Multi (PCGrad), outperforming baselines such as nn-Meter, Brp-NAS, and DIPPM. The approach offers low deployment and usage overhead and broad applicability across devices and DL frameworks via ONNX, enabling scalable, accurate performance forecasting for NAS and scheduling tasks.
Abstract
Predicting the performance of deep learning (DL) models, such as execution time and resource utilization, is crucial for Neural Architecture Search (NAS), DL cluster schedulers, and other technologies that advance deep learning. The representation of a model is the foundation for its performance prediction. However, existing methods cannot comprehensively represent diverse model configurations, resulting in unsatisfactory accuracy. To address this, we represent a model as a graph that includes the topology, along with the node, edge, and global features, all of which are crucial for effectively capturing the performance of the model. Based on this representation, we propose PerfSeer, a novel predictor that uses a Graph Neural Network (GNN)-based performance prediction model, SeerNet. SeerNet fully leverages the topology and various features, while incorporating optimizations such as Synergistic Max-Mean aggregation (SynMM) and Global-Node Perspective Boost (GNPB) to capture the critical performance information more effectively, enabling it to predict the performance of models accurately. Furthermore, SeerNet can be extended to SeerNet-Multi by using Project Conflicting Gradients (PCGrad), enabling efficient simultaneous prediction of multiple performance metrics without significantly affecting accuracy. We constructed a dataset containing performance metrics for 53k+ model configurations, including execution time, memory usage, and Streaming Multiprocessor (SM) utilization during both training and inference. The evaluation results show that PerfSeer outperforms nn-Meter, Brp-NAS, and DIPPM.
