Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models
Qutub Syed Sha, Michael Paulitsch, Karthik Pattabiraman, Korbinian Hagn, Fabian Oboril, Cornelius Buerkle, Kay-Ulrich Scholl, Gereon Hinz, Alois Knoll
TL;DR
Transformer-based object detectors face safety risks from soft errors during inference, and CNN-centered range restrictions are insufficient for these architectures. The authors propose Global Clipper and Global Hybrid Clipper to extend range restriction to activation and linear layers within self-attention blocks, validated by a large-scale fault-injection study across two transformers (DINO-DETR, Lite-DETR) and two CNNs on three datasets, totaling ~3.3 million inferences. The results show near-zero faulty inferences for transformers, with Global Clipper outperforming prior methods like Ranger and Clipper; the Hybrid variant is essential for certain architectures. The work provides practical, low-overhead defenses and deepens understanding of attention-block vulnerability, with potential extensions to semantic segmentation and video tracking in safety-critical deployments.
Abstract
As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.
