Table of Contents
Fetching ...

Periodic Online Testing for Sparse Systolic Tensor Arrays

Christodoulos Peltekis, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos

TL;DR

This work tackles reliability for structured-sparse ML accelerators by introducing a periodic online self-test that reuses the array’s already-loaded weights and only four test vectors to detect permanent faults before computation begins. The method provides column-level fault localization and maintains low latency and hardware overhead, making it suitable for safety-critical deployments. Gate-level fault-injection across CNN benchmarks demonstrates high fault coverage (average ~94.2%) with modest runtime (0.5%–2%) and area (~3%) overhead, though 100% coverage is not achievable due to fixed weight paths. Overall, the approach offers a practical, lightweight mechanism for fault detection in sparse systolic tensor arrays, enabling safer edge ML inference in automotive, medical, and aerospace domains.

Abstract

Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware. Sparse systolic tensor arrays - specifically designed to accelerate these structured-sparse ML models - play a pivotal role in enabling efficient computations. As ML is increasingly integrated into safety-critical systems, it is of paramount importance to ensure the reliability of these systems. This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before computation begins. The new technique relies on merely four test vectors and exploits the weight values already loaded within the systolic array to comprehensively test the system. Fault-injection campaigns within the gate-level netlist, while executing three well-established Convolutional Neural Networks (CNN), validate the efficiency of the proposed approach, which is shown to achieve very high fault coverage, while incurring minimal performance and area overheads.

Periodic Online Testing for Sparse Systolic Tensor Arrays

TL;DR

This work tackles reliability for structured-sparse ML accelerators by introducing a periodic online self-test that reuses the array’s already-loaded weights and only four test vectors to detect permanent faults before computation begins. The method provides column-level fault localization and maintains low latency and hardware overhead, making it suitable for safety-critical deployments. Gate-level fault-injection across CNN benchmarks demonstrates high fault coverage (average ~94.2%) with modest runtime (0.5%–2%) and area (~3%) overhead, though 100% coverage is not achievable due to fixed weight paths. Overall, the approach offers a practical, lightweight mechanism for fault detection in sparse systolic tensor arrays, enabling safer edge ML inference in automotive, medical, and aerospace domains.

Abstract

Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware. Sparse systolic tensor arrays - specifically designed to accelerate these structured-sparse ML models - play a pivotal role in enabling efficient computations. As ML is increasingly integrated into safety-critical systems, it is of paramount importance to ensure the reliability of these systems. This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before computation begins. The new technique relies on merely four test vectors and exploits the weight values already loaded within the systolic array to comprehensively test the system. Fault-injection campaigns within the gate-level netlist, while executing three well-established Convolutional Neural Networks (CNN), validate the efficiency of the proposed approach, which is shown to achieve very high fault coverage, while incurring minimal performance and area overheads.

Paper Structure

This paper contains 9 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Example of (a) unstructured sparsity; and (b) structured block sparsity of 2:4 (i.e., up to 2 non-zero elements in every 4 consecutive elements in each column) and their respective packed storage with their associated bit masks.
  • Figure 2: A sparse systolic tensor array that employs the weight-stationary dataflow; i.e., the weights are pre-loaded into the Tensor Processing Elements (TPE) and remain stationary during the operations. The inputs and outputs flow in the horizontal (west-to-east) and vertical (north-to-south) directions, respectively.
  • Figure 3: The four types of registers in a single TPE of a sparse systolic array.
  • Figure 4: The four test vectors are applied periodically to the sparse systolic tensor array to detect permanent faults within any one register.
  • Figure 5: The fault coverage achieved after completion of each layer of ResNet50 resnet. The fault coverage converges quite rapidly to a high value.
  • ...and 1 more figures