SAFFIRA: a Framework for Assessing the Reliability of Systolic-Array-Based DNN Accelerators
Mahdi Taheri, Masoud Daneshtalab, Jaan Raik, Maksim Jenihhin, Salvatore Pappalardo, Paul Jimenez, Bastien Deveautour, Alberto Bosio
TL;DR
SAFFIRA tackles reliability assessment for systolic-array DNN accelerators by introducing a hierarchical software-based, hardware-aware fault-injection flow that uses Uniform Recurrent Equations ($URE$) to model the SA core. The method enables fast, hardware-aware fault injection, supports multiple data representations, and introduces a novel faulty-distance metric to quantify resilience, all implemented as an open-source tool with PyTorch integration. Empirical evaluation on LeNet-5 and larger CNNs demonstrates significant FI-time reductions (up to 3x vs hybrid FI and up to 2000x vs RTL) while preserving accuracy, highlighting practical impact for safety-critical deployments. The work provides a comprehensive framework for reliability analysis of DNN accelerators, including a formalization of fault propagation, a versatile data-path model, and a path toward broader hardware-system integration.
Abstract
Systolic array has emerged as a prominent architecture for Deep Neural Network (DNN) hardware accelerators, providing high-throughput and low-latency performance essential for deploying DNNs across diverse applications. However, when used in safety-critical applications, reliability assessment is mandatory to guarantee the correct behavior of DNN accelerators. While fault injection stands out as a well-established practical and robust method for reliability assessment, it is still a very time-consuming process. This paper addresses the time efficiency issue by introducing a novel hierarchical software-based hardware-aware fault injection strategy tailored for systolic array-based DNN accelerators.
