Table of Contents
Fetching ...

Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage

Aishwarya Gupta, Indranil Saha, Piyush Rai

TL;DR

The paper tackles the challenge of robust DNN testing by proposing Co-Domain Coverage (CDC), an end-to-end, black-box coverage criterion that operates on the model’s output space rather than internal activations. It introduces CoDoFuzz, a fuzzing framework guided by CDC to generate a diverse test suite with many misclassifications and high uncertainty, across six datasets and multiple architectures. Empirical results show CDC-based testing yields the largest number of erroneous inputs, high output diversity, and the greatest gains when retraining with the generated data, outperforming neuron- and layer-based baselines. This approach enables scalable, black-box robustness assessment and data augmentation for improving DNN reliability in real-world deployments.

Abstract

Rigorous testing of machine learning models is necessary for trustworthy deployments. We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs). Most existing methods create test inputs based on maximizing some "coverage" criterion/metric such as a fraction of neurons activated by the test inputs. Such approaches, however, can only analyze each neuron's behavior or each layer's output in isolation and are unable to capture their collective effect on the DNN's output, resulting in test suites that often do not capture the various failure modes of the DNN adequately. These approaches also require white-box access, i.e., access to the DNN's internals (node activations). We present a novel black-box coverage criterion called Co-Domain Coverage (CDC), which is defined as a function of the model's output and thus takes into account its end-to-end behavior. Subsequently, we develop a new fuzz testing procedure named CoDoFuzz, which uses CDC to guide the fuzzing process to generate a test suite for a DNN. We extensively compare the test suite generated by CoDoFuzz with those generated using several state-of-the-art coverage-based fuzz testing methods for the DNNs trained on six publicly available datasets. Experimental results establish the efficiency and efficacy of CoDoFuzz in generating the largest number of misclassified inputs and the inputs for which the model lacks confidence in its decision.

Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage

TL;DR

The paper tackles the challenge of robust DNN testing by proposing Co-Domain Coverage (CDC), an end-to-end, black-box coverage criterion that operates on the model’s output space rather than internal activations. It introduces CoDoFuzz, a fuzzing framework guided by CDC to generate a diverse test suite with many misclassifications and high uncertainty, across six datasets and multiple architectures. Empirical results show CDC-based testing yields the largest number of erroneous inputs, high output diversity, and the greatest gains when retraining with the generated data, outperforming neuron- and layer-based baselines. This approach enables scalable, black-box robustness assessment and data augmentation for improving DNN reliability in real-world deployments.

Abstract

Rigorous testing of machine learning models is necessary for trustworthy deployments. We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs). Most existing methods create test inputs based on maximizing some "coverage" criterion/metric such as a fraction of neurons activated by the test inputs. Such approaches, however, can only analyze each neuron's behavior or each layer's output in isolation and are unable to capture their collective effect on the DNN's output, resulting in test suites that often do not capture the various failure modes of the DNN adequately. These approaches also require white-box access, i.e., access to the DNN's internals (node activations). We present a novel black-box coverage criterion called Co-Domain Coverage (CDC), which is defined as a function of the model's output and thus takes into account its end-to-end behavior. Subsequently, we develop a new fuzz testing procedure named CoDoFuzz, which uses CDC to guide the fuzzing process to generate a test suite for a DNN. We extensively compare the test suite generated by CoDoFuzz with those generated using several state-of-the-art coverage-based fuzz testing methods for the DNNs trained on six publicly available datasets. Experimental results establish the efficiency and efficacy of CoDoFuzz in generating the largest number of misclassified inputs and the inputs for which the model lacks confidence in its decision.
Paper Structure (17 sections, 4 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Distribution of maximum predicted probability, predicted classes, and discretized co-domain space.
  • Figure 2: Co-Domain Coverage: Images are given to the black-box DNN model as an input which outputs a tuple of predicted class and its probability. Based on the output tuples, inputs are mapped to a cell in the co-domain of the DNN.
  • Figure 3: Images from the test suite crafted using CoDoFuzz. The initial seed images are labeled as original (in blue) with their groundtruth class shown in green color. The model's wrong predictions on the transformed images are displayed in red.
  • Figure 4: Correlation between the coverage achieved by CDC and the number of selected erroneous inputs.