Table of Contents
Fetching ...

Exploring Robustness of Image Recognition Models on Hardware Accelerators

Nikolaos Louloudakis, Perry Gibson, José Cano, Ajitha Rajan

TL;DR

MutateNN addresses robustness verification of image recognition DNNs deployed on hardware accelerators by combining mutation and differential testing to reveal faults in device code and compiler optimizations. It generates graph- and code-level mutants, compiles them with Apache TVM, and executes inference across multiple devices, comparing outputs with diverse metrics. Key findings include large output discrepancies up to 90.3% for conditional mutations and severe degradation or crashes due to numeric-precision changes, underscoring cross-device vulnerability. The work provides a configurable, extensible framework for hardware-aware DNN testing with practical implications for test suites, fail-safes, and deployment decisions.

Abstract

As the usage of Artificial Intelligence (AI) on resource-intensive and safety-critical tasks increases, a variety of Machine Learning (ML) compilers have been developed, enabling compatibility of Deep Neural Networks (DNNs) with a variety of hardware acceleration devices. However, given that DNNs are widely utilized for challenging and demanding tasks, the behavior of these compilers must be verified. To this direction, we propose MutateNN, a tool that utilizes elements of both differential and mutation testing in order to examine the robustness of image recognition models when deployed on hardware accelerators with different capabilities, in the presence of faults in their target device code - introduced either by developers, or problems in their compilation process. We focus on the image recognition domain by applying mutation testing to 7 well-established DNN models, introducing 21 mutations of 6 different categories. We deployed our mutants on 4 different hardware acceleration devices of varying capabilities and observed that DNN models presented discrepancies of up to 90.3% in mutants related to conditional operators across devices. We also observed that mutations related to layer modification, arithmetic types and input affected severely the overall model performance (up to 99.8%) or led to model crashes, in a consistent manner across devices.

Exploring Robustness of Image Recognition Models on Hardware Accelerators

TL;DR

MutateNN addresses robustness verification of image recognition DNNs deployed on hardware accelerators by combining mutation and differential testing to reveal faults in device code and compiler optimizations. It generates graph- and code-level mutants, compiles them with Apache TVM, and executes inference across multiple devices, comparing outputs with diverse metrics. Key findings include large output discrepancies up to 90.3% for conditional mutations and severe degradation or crashes due to numeric-precision changes, underscoring cross-device vulnerability. The work provides a configurable, extensible framework for hardware-aware DNN testing with practical implications for test suites, fail-safes, and deployment decisions.

Abstract

As the usage of Artificial Intelligence (AI) on resource-intensive and safety-critical tasks increases, a variety of Machine Learning (ML) compilers have been developed, enabling compatibility of Deep Neural Networks (DNNs) with a variety of hardware acceleration devices. However, given that DNNs are widely utilized for challenging and demanding tasks, the behavior of these compilers must be verified. To this direction, we propose MutateNN, a tool that utilizes elements of both differential and mutation testing in order to examine the robustness of image recognition models when deployed on hardware accelerators with different capabilities, in the presence of faults in their target device code - introduced either by developers, or problems in their compilation process. We focus on the image recognition domain by applying mutation testing to 7 well-established DNN models, introducing 21 mutations of 6 different categories. We deployed our mutants on 4 different hardware acceleration devices of varying capabilities and observed that DNN models presented discrepancies of up to 90.3% in mutants related to conditional operators across devices. We also observed that mutations related to layer modification, arithmetic types and input affected severely the overall model performance (up to 99.8%) or led to model crashes, in a consistent manner across devices.
Paper Structure (21 sections, 6 figures)

This paper contains 21 sections, 6 figures.

Figures (6)

  • Figure 1: Architecture of MutateNN: (1) Model Variants Generator generates mutations and compiles them to device code; (2) Mutations Execution executes the various mutants on images from a target dataset; and (3) Analysis compares inference outputs and reports metrics across mutant executions.
  • Figure 2: Implementation of the operator mutation generation in TIR pass.
  • Figure 3: Implementation of the activation function replacement mutation generation in Relay IR pass.
  • Figure 4: Injected tensor transposition in Relay IR.
  • Figure 5: Mutation of a conditional operator in TIR for a fused operation in MobileNetV2.
  • ...and 1 more figures