GPU-Fuzz: Finding Memory Errors in Deep Learning Frameworks
Zihao Li, Hongyi Lu, Yanan Guo, Zhenkai Zhang, Shuai Wang, Fengwei Zhang
TL;DR
This work tackles GPU memory robustness in deep learning frameworks by introducing GPU-Fuzz, a constraint-guided fuzzer focused on operator parameter-space rather than network topology. It models operator semantics as constraint formulas and uses the SMT solver $\text{Z3}$ to generate diverse, boundary-focused inputs that stress CUDA kernels. The approach uncovered 13 previously unknown bugs across PyTorch, TensorFlow, and PaddlePaddle, including numerous silent memory corruptions, highlighting a critical blind spot in existing fuzzers. By enabling cross-framework execution and integrating compute-sanitizer for reproducible detection, GPU-Fuzz provides a practical path toward stronger GPU memory safety in the DL ecosystem.
Abstract
GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by modeling operator parameters as formal constraints. GPU-Fuzz utilizes a constraint solver to generate test cases that systematically probe error-prone boundary conditions in GPU kernels. Applied to PyTorch, TensorFlow, and PaddlePaddle, we uncovered 13 unknown bugs, demonstrating the effectiveness of GPU-Fuzz in finding memory errors.
