Table of Contents
Fetching ...

Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models

Qutub Syed Sha, Michael Paulitsch, Karthik Pattabiraman, Korbinian Hagn, Fabian Oboril, Cornelius Buerkle, Kay-Ulrich Scholl, Gereon Hinz, Alois Knoll

TL;DR

Transformer-based object detectors face safety risks from soft errors during inference, and CNN-centered range restrictions are insufficient for these architectures. The authors propose Global Clipper and Global Hybrid Clipper to extend range restriction to activation and linear layers within self-attention blocks, validated by a large-scale fault-injection study across two transformers (DINO-DETR, Lite-DETR) and two CNNs on three datasets, totaling ~3.3 million inferences. The results show near-zero faulty inferences for transformers, with Global Clipper outperforming prior methods like Ranger and Clipper; the Hybrid variant is essential for certain architectures. The work provides practical, low-overhead defenses and deepens understanding of attention-block vulnerability, with potential extensions to semantic segmentation and video tracking in safety-critical deployments.

Abstract

As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.

Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models

TL;DR

Transformer-based object detectors face safety risks from soft errors during inference, and CNN-centered range restrictions are insufficient for these architectures. The authors propose Global Clipper and Global Hybrid Clipper to extend range restriction to activation and linear layers within self-attention blocks, validated by a large-scale fault-injection study across two transformers (DINO-DETR, Lite-DETR) and two CNNs on three datasets, totaling ~3.3 million inferences. The results show near-zero faulty inferences for transformers, with Global Clipper outperforming prior methods like Ranger and Clipper; the Hybrid variant is essential for certain architectures. The work provides practical, low-overhead defenses and deepens understanding of attention-block vulnerability, with potential extensions to semantic segmentation and video tracking in safety-critical deployments.

Abstract

As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.
Paper Structure (12 sections, 4 equations, 13 figures)

This paper contains 12 sections, 4 equations, 13 figures.

Figures (13)

  • Figure 1: Abstract architecture of a DNN accelerator. The upper figure illustrates potential soft errors resulting in bit flips within neurons or weights at specific layers of the DNN model. The lower figure displays the mean values of layers in non-faulty inferences compared to faulty inference values when a bit-flip error is injected at the 50th layer of a transformer model DINO-DETR.
  • Figure 2: Visual example of faulty inferences on CoCo trained DINO-DETR model due to bit-flips caused by the soft errors.
  • Figure 3: Integrating Global Clipper layers into transformer-based object detection models' self-attention blocks. *Ranger layers are recommended to be added to activation functions, usually at ReLU layers, not SoftMax.
  • Figure 4: Lower and upper bounds for range restrictions, encompassing activation layers and linear layers within the self-attention blocks of the DINO-DETR model, are defined by the Global Clipper technique.
  • Figure 5: Tracking the mean and variance of layers within the DINO-DETR model, this illustration focuses on the ReLU activation layer and linear layer within the self-attention block.
  • ...and 8 more figures