Fuzzing Automatic Differentiation in Deep-Learning Libraries

Chenyuan Yang; Yinlin Deng; Jiayi Yao; Yuxing Tu; Hanchi Li; Lingming Zhang

Fuzzing Automatic Differentiation in Deep-Learning Libraries

Chenyuan Yang, Yinlin Deng, Jiayi Yao, Yuxing Tu, Hanchi Li, Lingming Zhang

TL;DR

This work tackles the pressing issue of bugs in automatic differentiation within DL libraries by introducing $\nabla$Fuzz, a fully automated API-level fuzzer that uses differential testing across multiple AD execution modes to verify outputs and gradients, including higher-order derivatives. Built on FreeFuzz and enhanced with robust test oracles and filtering, it can test both first- and second-order gradients across PyTorch, TensorFlow, JAX, and OneFlow. The evaluation reports 173 detected bugs (144 confirmed, 38 fixed), with 107 AD-related and 117 previously unknown, and shows strong improvements in code coverage and discovery of high-priority AD bugs compared to existing fuzzers. The approach demonstrates practical impact for improving reliability of DL libraries and offers a framework for future fuzzing and validation of differentiable computations in large-scale DL systems.

Abstract

Deep learning (DL) has attracted wide attention and has been widely deployed in recent years. As a result, more and more research efforts have been dedicated to testing DL libraries and frameworks. However, existing work largely overlooked one crucial component of any DL system, automatic differentiation (AD), which is the basis for the recent development of DL. To this end, we propose $\nabla$Fuzz, the first general and practical approach specifically targeting the critical AD component in DL libraries. Our key insight is that each DL library API can be abstracted into a function processing tensors/vectors, which can be differentially tested under various execution scenarios (for computing outputs/gradients with different implementations). We have implemented $\nabla$Fuzz as a fully automated API-level fuzzer targeting AD in DL libraries, which utilizes differential testing on different execution scenarios to test both first-order and high-order gradients, and also includes automated filtering strategies to remove false positives caused by numerical instability. We have performed an extensive study on four of the most popular and actively-maintained DL libraries, PyTorch, TensorFlow, JAX, and OneFlow. The result shows that $\nabla$Fuzz substantially outperforms state-of-the-art fuzzers in terms of both code coverage and bug detection. To date, $\nabla$Fuzz has detected 173 bugs for the studied DL libraries, with 144 already confirmed by developers (117 of which are previously unknown bugs and 107 are related to AD). Remarkably, $\nabla$Fuzz contributed 58.3% (7/12) of all high-priority AD bugs for PyTorch and JAX during a two-month period. None of the confirmed AD bugs were detected by existing fuzzers.

Fuzzing Automatic Differentiation in Deep-Learning Libraries

TL;DR

This work tackles the pressing issue of bugs in automatic differentiation within DL libraries by introducing

Fuzz, a fully automated API-level fuzzer that uses differential testing across multiple AD execution modes to verify outputs and gradients, including higher-order derivatives. Built on FreeFuzz and enhanced with robust test oracles and filtering, it can test both first- and second-order gradients across PyTorch, TensorFlow, JAX, and OneFlow. The evaluation reports 173 detected bugs (144 confirmed, 38 fixed), with 107 AD-related and 117 previously unknown, and shows strong improvements in code coverage and discovery of high-priority AD bugs compared to existing fuzzers. The approach demonstrates practical impact for improving reliability of DL libraries and offers a framework for future fuzzing and validation of differentiable computations in large-scale DL systems.

Abstract

Fuzz, the first general and practical approach specifically targeting the critical AD component in DL libraries. Our key insight is that each DL library API can be abstracted into a function processing tensors/vectors, which can be differentially tested under various execution scenarios (for computing outputs/gradients with different implementations). We have implemented

Fuzz as a fully automated API-level fuzzer targeting AD in DL libraries, which utilizes differential testing on different execution scenarios to test both first-order and high-order gradients, and also includes automated filtering strategies to remove false positives caused by numerical instability. We have performed an extensive study on four of the most popular and actively-maintained DL libraries, PyTorch, TensorFlow, JAX, and OneFlow. The result shows that

Fuzz substantially outperforms state-of-the-art fuzzers in terms of both code coverage and bug detection. To date,

Fuzz has detected 173 bugs for the studied DL libraries, with 144 already confirmed by developers (117 of which are previously unknown bugs and 107 are related to AD). Remarkably,

Fuzz contributed 58.3% (7/12) of all high-priority AD bugs for PyTorch and JAX during a two-month period. None of the confirmed AD bugs were detected by existing fuzzers.

Paper Structure (33 sections, 7 equations, 11 figures, 12 tables, 1 algorithm)

This paper contains 33 sections, 7 equations, 11 figures, 12 tables, 1 algorithm.

Introduction
Background
Basics about DL Libraries
Automatic Differentiation
Preliminaries
Mathematics behind Automatic Differentiation
Numerical Differentiation
Approach
API-level Fuzzer
Test Oracles
Output Check
Gradient Check
High-order Gradients
Filtering Strategies
Differentiability
...and 18 more sections

Figures (11)

Figure 1: Crash bug in AD
Figure 2: An example of DL model training and inference
Figure 3: Function $f(x_1,x_2) = \log(x_1\cdot x_2) + \sin(x_1)$
Figure 4: Overview of $\nabla$Fuzz
Figure 5: Inconsistent outputs w/ and w/o AD
...and 6 more figures

Theorems & Definitions (6)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6

Fuzzing Automatic Differentiation in Deep-Learning Libraries

TL;DR

Abstract

Fuzzing Automatic Differentiation in Deep-Learning Libraries

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (6)