Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

Riccardo Tedeschi; Luigi Ghionda; Alessandro Nadalini; Yvan Tortorella; Arpan Suravi Prasad; Luca Benini; Davide Rossi; Francesco Conti

Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

Riccardo Tedeschi, Luigi Ghionda, Alessandro Nadalini, Yvan Tortorella, Arpan Suravi Prasad, Luca Benini, Davide Rossi, Francesco Conti

TL;DR

Safe-NEureka tackles the need for reliable on-board satellite AI by delivering a runtime-reconfigurable DNN accelerator that can switch between high-throughput and fault-tolerant modes. It combines Hybrid Modular Redundancy with hardware-assisted rollback, ECC-protected memory, and a $TMR$-protected controller to achieve end-to-end resilience in a heterogeneous RISC-V cluster. The GlobalFoundries $12\mathrm{nm}$ tapeout demonstrates a $\sim15\%$ area overhead while reducing faulty executions by $\sim96\%$ in redundancy mode and preserving near-baseline performance in the high-throughput mode with acceptable penalties. This mixed-criticality approach enables space missions to allocate overheads to critical tasks while maintaining real-time AI processing, and the authors also release the full RTL as open-source for broader adoption.

Abstract

Low Earth Orbit (LEO) constellations are revolutionizing the space sector, with on-board Artificial Intelligence (AI) becoming pivotal for next-generation satellites. AI acceleration is essential for safety-critical functions such as autonomous Guidance, Navigation, and Control (GNC), where errors cannot be tolerated, and performance-critical processing of high-bandwidth sensor data, where occasional errors are tolerable. Consequently, AI accelerators for satellites must combine robust protection against radiation-induced faults with high throughput. This paper presents Safe-NEureka, a Hybrid Modular Redundant Deep Neural Network (DNN) accelerator for heterogeneous RISC-V systems. It operates in two modes: a redundancy mode utilizing Dual Modular Redundancy (DMR) with hardware-based recovery, and a performance mode repurposing redundant datapaths to maximize parallel throughput. Furthermore, its memory interface is protected by Error Correction Codes (ECCs), and the controller by Triple Modular Redundancy (TMR). Implementation in GlobalFoundries 12nm technology shows a 96 reduction in faulty executions in redundancy mode, with a manageable 15 area overhead. In performance mode, the architecture achieves near-baseline speeds on 3x3 dense convolutions with a 5 throughput and 11 efficiency reduction, compared to 48 and 53 in redundancy mode. This flexibility ensures high overheads are limited to critical tasks, establishing Safe-NEureka as a versatile solution for space applications.

Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

TL;DR

-protected controller to achieve end-to-end resilience in a heterogeneous RISC-V cluster. The GlobalFoundries

tapeout demonstrates a

area overhead while reducing faulty executions by

in redundancy mode and preserving near-baseline performance in the high-throughput mode with acceptable penalties. This mixed-criticality approach enables space missions to allocate overheads to critical tasks while maintaining real-time AI processing, and the authors also release the full RTL as open-source for broader adoption.

Abstract

Paper Structure (23 sections, 13 figures, 1 table)

This paper contains 23 sections, 13 figures, 1 table.

Introduction
Background
Radiation induced single event effects in space
Reliability techniques for accelerators
Software and algorithmic approaches
Sensitivity-Based Mapping & Scheduling
Hardware & Architecture Level
Fault tolerance in emerging acceleration paradigms
Architecture
Heterogeneous RISC-V Cluster
Safe-NEureka Architecture
Accelerator microarchitecture
Hybrid Modular Redundancy
Hardware-assisted error recovery
Error correction in streamer
...and 8 more sections

Figures (13)

Figure 1: Multi-core RISC-V (RV32) cluster architecture augmented with the Safe-NEureka neural engine, featuring Safe-protected RISC-V cores and ECC-protected TCDM and HCI.
Figure 2: Architecture of run-time reconfigurable Safe-NEureka accelerator.
Figure 3: Comparison of datapath and $\mu$loop execution flows during fault-free operation in Safe-NEureka's performance and redundancy modes.
Figure 4: Pseudo-code for the Safe-NEureka $\mu$loop tiling patterns in performance and redundancy modes.
Figure 5: Graphical representation of the redundancy mode Safe-Neureka FSM and $\mu$loops operation during error detection and subsequent recovery.
...and 8 more figures

Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

TL;DR

Abstract

Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (13)