Table of Contents
Fetching ...

GPU-Fuzz: Finding Memory Errors in Deep Learning Frameworks

Zihao Li, Hongyi Lu, Yanan Guo, Zhenkai Zhang, Shuai Wang, Fengwei Zhang

TL;DR

This work tackles GPU memory robustness in deep learning frameworks by introducing GPU-Fuzz, a constraint-guided fuzzer focused on operator parameter-space rather than network topology. It models operator semantics as constraint formulas and uses the SMT solver $\text{Z3}$ to generate diverse, boundary-focused inputs that stress CUDA kernels. The approach uncovered 13 previously unknown bugs across PyTorch, TensorFlow, and PaddlePaddle, including numerous silent memory corruptions, highlighting a critical blind spot in existing fuzzers. By enabling cross-framework execution and integrating compute-sanitizer for reproducible detection, GPU-Fuzz provides a practical path toward stronger GPU memory safety in the DL ecosystem.

Abstract

GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by modeling operator parameters as formal constraints. GPU-Fuzz utilizes a constraint solver to generate test cases that systematically probe error-prone boundary conditions in GPU kernels. Applied to PyTorch, TensorFlow, and PaddlePaddle, we uncovered 13 unknown bugs, demonstrating the effectiveness of GPU-Fuzz in finding memory errors.

GPU-Fuzz: Finding Memory Errors in Deep Learning Frameworks

TL;DR

This work tackles GPU memory robustness in deep learning frameworks by introducing GPU-Fuzz, a constraint-guided fuzzer focused on operator parameter-space rather than network topology. It models operator semantics as constraint formulas and uses the SMT solver to generate diverse, boundary-focused inputs that stress CUDA kernels. The approach uncovered 13 previously unknown bugs across PyTorch, TensorFlow, and PaddlePaddle, including numerous silent memory corruptions, highlighting a critical blind spot in existing fuzzers. By enabling cross-framework execution and integrating compute-sanitizer for reproducible detection, GPU-Fuzz provides a practical path toward stronger GPU memory safety in the DL ecosystem.

Abstract

GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by modeling operator parameters as formal constraints. GPU-Fuzz utilizes a constraint solver to generate test cases that systematically probe error-prone boundary conditions in GPU kernels. Applied to PyTorch, TensorFlow, and PaddlePaddle, we uncovered 13 unknown bugs, demonstrating the effectiveness of GPU-Fuzz in finding memory errors.
Paper Structure (19 sections, 9 figures, 4 tables)

This paper contains 19 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: GPU memory layout and potential attacks.
  • Figure 2: From Python API to CUDA kernel.
  • Figure 3: The architecture of the GPU-Fuzz system.
  • Figure 4: Constraint modeling for convolution operators.
  • Figure 5: Constraint solving process.
  • ...and 4 more figures