When Radiation Meets Linux: Analyzing Soft Errors in Linux on COTS SoCs under Proton Irradiation
Saad Memon, Rafal Graczyk, Tomasz Rajkowski, Jan Swakon, Damian Wrobel, Sebastian Kusyk, Seth Roffe, Mike Papadakis
TL;DR
This work addresses the reliability of unmodified Linux running on commercial SoCs in space by systematically probing soft errors with proton irradiation across three architectures (ARM Cortex‑A53 on 40 nm CMOS, 14 nm FinFET, and a RISC‑V softcore on FPGA). It combines aggressive stress testing with targeted irradiation to quantify Linux‑level SEE vulnerability through Linux‑SEFI cross‑sections and to identify vulnerable kernel subsystems. The study finds that the 14 nm FinFET i.MX 8M Plus achieved 2–3× longer Linux uptime without ECC than the 40 nm CMOS events, while eMMC storage emerges as a major reliability bottleneck, highlighting peripheral impacts on system resilience. The results provide foundational data and concrete mitigation directions (software hardening, MC/DMR/TMR strategies, ECC adoption, and hardware–software co‑design) to inform space mission readiness for COTS‑based Linux systems."
Abstract
The increasing use of Linux on commercial off-the-shelf (COTS) system-on-chip (SoC) in spaceborne computing inherits COTS susceptibility to radiation-induced failures like soft errors. Modern SoCs exacerbate this issue as aggressive transistor scaling reduces critical charge thresholds to induce soft errors and increases radiation effects within densely packed transistors, degrading overall reliability. Linux's monolithic architecture amplifies these risks, as tightly coupled kernel subsystems propagate errors to critical components (e.g., memory management), while limited error-correcting code (ECC) offers minimal mitigation. Furthermore, the lack of public soft error data from irradiation tests on COTS SoCs running Linux hinders reliability improvements. This study evaluates proton irradiation effects (20-50 MeV) on Linux across three COTS SoC architectures: Raspberry Pi Zero 2 W (40 nm CMOS, Cortex-A53), NXP i MX 8M Plus (14 nm FinFET, Cortex-A53), and OrangeCrab (40 nm FPGA, RISC-V). Irradiation results show the 14 nm FinFET NXP SoC achieved 2-3x longer Linux uptime without ECC memory versus both 40 nm CMOS counterparts, partially due to FinFET's reduced charge collection. Additionally, this work presents the first cross-architecture analysis of soft error-prone Linux kernel components in modern SoCs to develop targeted mitigations. The findings establish foundational data on Linux's soft error sensitivity in COTS SoCs, guiding mission readiness for space applications.
