Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

Haipeng Wang; Zhengyuan Wei; Qilin Zhou; Wing-Kwong Chan

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

Haipeng Wang, Zhengyuan Wei, Qilin Zhou, Wing-Kwong Chan

TL;DR

The paper introduces Contextual Confidence (CC), a black-box, non-coverage-based metric for evaluating the representativeness of test cases, and Clover, a context-aware fuzzing framework designed to improve the robustness of deep learning models within a testing-retraining pipeline. Clover leverages seed equivalence and adversarial front objects (α- and β-AFOs) to generate test cases and constructs a robustness-focused test suite through a layered, CC-driven selection process. Empirical results across multiple datasets and models show that Clover outperforms state-of-the-art selection and fuzzing techniques (Adapt and RobOT) in both the quantity and quality of test cases and in robustness improvements after retraining, with configuration using CC generally outperforming configurations based on DeepGini or FOL. The work demonstrates the practical value of context-aware fuzzing for robustness and outlines future directions, including refinements to seed equivalence and broader validation across architectures and defenses.

Abstract

In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a DL testing technique and retrains the DL model under test with test suites that contain these test cases. On the one hand, the strategies of these fuzzing techniques tightly integrate the key characteristics of their testing metrics. On the other hand, they are often unaware of whether their generated test cases are different from the samples surrounding these test cases and whether there are relevant test cases of other seeds when generating the current one. We propose a novel testing metric called Contextual Confidence (CC). CC measures a test case through the surrounding samples of a test case in terms of their mean probability predicted to the prediction label of the test case. Based on this metric, we further propose a novel fuzzing technique Clover as a DL testing technique for the pipeline. In each fuzzing round, Clover first finds a set of seeds whose labels are the same as the label of the seed under fuzzing. At the same time, it locates the corresponding test case that achieves the highest CC values among the existing test cases of each seed in this set of seeds and shares the same prediction label as the existing test case of the seed under fuzzing that achieves the highest CC value. Clover computes the piece of difference between each such pair of a seed and a test case. It incrementally applies these pieces of differences to perturb the current test case of the seed under fuzzing that achieves the highest CC value and to perturb the resulting samples along the gradient to generate new test cases for the seed under fuzzing.

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

TL;DR

Abstract

Paper Structure (72 sections, 4 equations, 26 figures, 15 tables, 4 algorithms)

This paper contains 72 sections, 4 equations, 26 figures, 15 tables, 4 algorithms.

Introduction
Preliminaries
Deep Neural Networks
DL Testing with the Target of Robustness Improvement for Deep Learning Models
Why Selecting Test Cases for Improving the Robustness Property of DL Models?
Related Methods for Selecting Test Cases and their Limitations
Model Retraining in the Testing-Retraining Pipeline
Testing-Retraining Pipeline for Testing DL Models for Robustness Improvement
Auxiliary functions
Clover
Intuition
Measuring the Representativeness of Test Cases
Conceptual Fuzzing Model of Clover
Overview
Track representative test case
...and 57 more sections

Figures (26)

Figure 1: Illustration for Calculating CC Values for Different Test Cases Generated from the Same Seed.
Figure 2: Relationship between the Seed $z_i$ in Z and its Test Cases and $\alpha$-Representative Test Case
Figure 3: Relationship among Seeds in Z, Seeds in X, and $\alpha$-Representative Test Cases
Figure 4: Relationship between $\alpha$-AFO and $\beta$-AFO at the $\text{(}i+1\text{)}^{th}$ Fuzzing Round
Figure 5: Equivalence Classes at the $\text{(}i+1\text{)}^{th}$ Fuzzing Round.
...and 21 more figures

Theorems & Definitions (1)

definition 1: Seed Equivalence and Equicalence Class of Seeds

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

TL;DR

Abstract

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (26)

Theorems & Definitions (1)