Soft Error Reliability Analysis of Vision Transformers

Xinghua Xue; Cheng Liu; Ying Wang; Bing Yang; Tao Luo; Lei Zhang; Huawei Li; Xiaowei Li

Soft Error Reliability Analysis of Vision Transformers

Xinghua Xue, Cheng Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, Xiaowei Li

TL;DR

This paper analyzes soft error reliability of Vision Transformers (ViTs) across model-, layer-, module-, and patch-level granularity. It introduces an operation-wise fault-injection framework and reveals that ViTs are generally resilient for linear computations (GEMM/FC) but more vulnerable in non-linear operations (softmax, GELU). To mitigate, it proposes a lightweight block-wise ABFT (LB-ABFT) for linear operations and a range-based protection for non-linear operations, achieving significant accuracy gains with modest overhead. Experiments on four ViT variants under bit-flip faults with BER in $[1\times 10^{-11}, 1\times 10^{-7}]$ demonstrate robustness improvements and favorable overhead trade-offs.

Abstract

Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks compared to convolutional neural networks (CNNs) and gain increasing popularity recently. Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by soft errors in large-scale VLSI designs have generally been overlooked. In this work, we mainly study the reliability of ViTs and investigate the vulnerability from different architecture granularities ranging from models, layers, modules, and patches for the first time. The investigation reveals that ViTs with the self-attention mechanism are generally more resilient on linear computing including general matrix-matrix multiplication (GEMM) and full connection (FC) and show a relatively even vulnerability distribution across the patches. ViTs involve more fragile non-linear computing such as softmax and GELU compared to typical CNNs. With the above observations, we propose a lightweight block-wise algorithm-based fault tolerance (LB-ABFT) approach to protect the linear computing implemented with distinct sizes of GEMM and apply a range-based protection scheme to mitigate soft errors in non-linear computing. According to our experiments, the proposed fault-tolerant approaches enhance ViTs accuracy significantly with minor computing overhead in presence of various soft errors.

Soft Error Reliability Analysis of Vision Transformers

TL;DR

demonstrate robustness improvements and favorable overhead trade-offs.

Abstract

Paper Structure (16 sections, 1 equation, 14 figures, 1 table, 2 algorithms)

This paper contains 16 sections, 1 equation, 14 figures, 1 table, 2 algorithms.

Introduction
Background and Related Work
Vision Transformers
Reliability Analysis of Deep Learning
Fault Tolerance of Deep Learning
ViTs Reliability Evaluation
Evaluation Setups
Model-wise Reliability Evaluation
Layer-wise Reliability Evaluation
Module-wise Reliability Evaluation
Patch-wise Reliability Evaluation
Fault-Tolerant Approaches for ViTs
Fault-tolerant Approach for Linear Operations
Fault-tolerant Approach for Non-Linear Operations
Evaluation
...and 1 more sections

Figures (14)

Figure 1: Typical Vision Transformer Architecture.
Figure 2: Top-1 model accuracy under different BER setups.
Figure 3: Layer-wise vulnerability factors.
Figure 4: Module-wise vulnerability factors.
Figure 5: Heatmap of patch-wise vulnerability factors.
...and 9 more figures

Soft Error Reliability Analysis of Vision Transformers

TL;DR

Abstract

Soft Error Reliability Analysis of Vision Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (14)