BEC: Bit-Level Static Analysis for Reliability against Soft Errors
Yousun Ko, Bernd Burgstaller
TL;DR
This work tackles the reliability challenge posed by soft errors in digital systems by introducing BEC, a static, bit-level analysis that tracks every fault site in the space $F = P \times V$ and classifies the effects of bit flips at compile time. It combines a forward Global Abstract Bit Value Analysis with a backward Bit-level Fault Index Coalescing Analysis to map bit corruptions to program semantics and to merge fault sites into equivalence classes, enabling pruning of fault-injection campaigns and reliability-aware instruction scheduling. Implemented in LLVM 16.0.0 for the RISCVISA/RISC-V architecture, BEC achieves up to $30.04\%$ fault-injection-run pruning (average $13.71\%$) and up to $13.11\%$ reliability improvements (average $4.94\%$) across eight benchmarks, without increasing the instruction count or the number of FI runs. The results demonstrate the practical value of bit-level static analysis for software-level reliability and motivate extensions to hardware synthesis and broader architecture coverage.
Abstract
Soft errors are a type of transient digital signal corruption that occurs in digital hardware components such as the internal flip-flops of CPU pipelines, the register file, memory cells, and even internal communication buses. Soft errors are caused by environmental radioactivity, magnetic interference, lasers, and temperature fluctuations, either unintentionally, or as part of a deliberate attempt to compromise a system and expose confidential data. We propose a bit-level error coalescing (BEC) static program analysis and its two use cases to understand and improve program reliability against soft errors. The BEC analysis tracks each bit corruption in the register file and classifies the effect of the corruption by its semantics at compile time. The usefulness of the proposed analysis is demonstrated in two scenarios, fault injection campaign pruning, and reliability-aware program transformation. Experimental results show that bit-level analysis pruned up to 30.04 % of exhaustive fault injection campaigns (13.71 % on average), without loss of accuracy. Program vulnerability was reduced by up to 13.11 % (4.94 % on average) through bit-level vulnerability-aware instruction scheduling. The analysis has been implemented within LLVM and evaluated on the RISC-V architecture. To the best of our knowledge, the proposed BEC analysis is the first bit-level compiler analysis for program reliability against soft errors. The proposed method is generic and not limited to a specific computer architecture.
