Periodic Online Testing for Sparse Systolic Tensor Arrays

Christodoulos Peltekis; Chrysostomos Nicopoulos; Giorgos Dimitrakopoulos

Periodic Online Testing for Sparse Systolic Tensor Arrays

Christodoulos Peltekis, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos

TL;DR

This work tackles reliability for structured-sparse ML accelerators by introducing a periodic online self-test that reuses the array’s already-loaded weights and only four test vectors to detect permanent faults before computation begins. The method provides column-level fault localization and maintains low latency and hardware overhead, making it suitable for safety-critical deployments. Gate-level fault-injection across CNN benchmarks demonstrates high fault coverage (average ~94.2%) with modest runtime (0.5%–2%) and area (~3%) overhead, though 100% coverage is not achievable due to fixed weight paths. Overall, the approach offers a practical, lightweight mechanism for fault detection in sparse systolic tensor arrays, enabling safer edge ML inference in automotive, medical, and aerospace domains.

Abstract

Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware. Sparse systolic tensor arrays - specifically designed to accelerate these structured-sparse ML models - play a pivotal role in enabling efficient computations. As ML is increasingly integrated into safety-critical systems, it is of paramount importance to ensure the reliability of these systems. This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before computation begins. The new technique relies on merely four test vectors and exploits the weight values already loaded within the systolic array to comprehensively test the system. Fault-injection campaigns within the gate-level netlist, while executing three well-established Convolutional Neural Networks (CNN), validate the efficiency of the proposed approach, which is shown to achieve very high fault coverage, while incurring minimal performance and area overheads.

Periodic Online Testing for Sparse Systolic Tensor Arrays

TL;DR

Abstract

Periodic Online Testing for Sparse Systolic Tensor Arrays

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)