Table of Contents
Fetching ...

Trikarenos: Design and Experimental Characterization of a Fault-Tolerant 28nm RISC-V-based SoC

Michael Rogenmoser, Philip Wiese, Bruno Endres Forlin, Frank K. Gürkaynak, Paolo Rech, Alessandra Menicucci, Marco Ottavi, Luca Benini

TL;DR

This work investigates the reliability of a fault-tolerant 28nm RISC-V SoC, Trikarenos, designed for automotive and space applications. It combines ECC-protected SRAM with a triple-core lockstep (TCLS) scheme to mitigate single-event upsets, and validates performance through neutron and proton beam tests as well as gate-level fault-injection simulations. Key findings show SRAM cross-sections around $10^{-14}$ cm$^2$/bit under neutrons and $10^{-15}$ cm$^2$/bit under protons, with TCLS recovering the majority of core-related faults and reducing the effective flip-flop cross-section to the order of $10^{-14}$ cm$^2$/FF. The results demonstrate the practicality of TCLS plus ECC for radiation-hardening in real hardware, while highlighting vulnerabilities in unprotected interconnects and peripherals that warrant further protection and observability for mission-critical deployments.

Abstract

RISC-V-based fault-tolerant system-on-chip (SoC) designs are critical for the new generation of automotive and space SoC architectures. However, reliability assessment requires characterization under controlled radiation doses to accurately quantify the fault tolerance of the fabricated designs. This work analyzes the Trikarenos design, a SoC implemented in TSMC 28nm, for single event upset (SEU) vulnerability under atmospheric neutron and 200 MeV proton radiation, comparing these results to simulation-based fault injection. All faults in error correction codes (ECC) protected memory are corrected by a scrubber, showing an estimated cross-section per bit of up to $1.09 \times 10^{-14}$ cm$^2$ bit$^{-1}$. Furthermore, the triple-core lockstep (TCLS) mechanism implemented in Trikarenos is validated and is shown to correct errors affecting a cross-section up to $3.23 \times 10^{-11}$ cm$^2$, with the remaining uncorrectable vulnerability below $5.36 \times 10^{-12}$ cm$^2$. When augmenting the experimental analysis of fabricated chips with gate-level fault injection in simulation, 99.10 % of injections into the SoC produced correct results, while 100 % of injections in the TCLS-protected cores were handled correctly. With 12.28 % of all injected faults leading to a TCLS recovery, this indicates an approximate effective flip-flop cross-section of up to $1.28 \times 10^{-14}$ cm$^2$/FF.

Trikarenos: Design and Experimental Characterization of a Fault-Tolerant 28nm RISC-V-based SoC

TL;DR

This work investigates the reliability of a fault-tolerant 28nm RISC-V SoC, Trikarenos, designed for automotive and space applications. It combines ECC-protected SRAM with a triple-core lockstep (TCLS) scheme to mitigate single-event upsets, and validates performance through neutron and proton beam tests as well as gate-level fault-injection simulations. Key findings show SRAM cross-sections around cm/bit under neutrons and cm/bit under protons, with TCLS recovering the majority of core-related faults and reducing the effective flip-flop cross-section to the order of cm/FF. The results demonstrate the practicality of TCLS plus ECC for radiation-hardening in real hardware, while highlighting vulnerabilities in unprotected interconnects and peripherals that warrant further protection and observability for mission-critical deployments.

Abstract

RISC-V-based fault-tolerant system-on-chip (SoC) designs are critical for the new generation of automotive and space SoC architectures. However, reliability assessment requires characterization under controlled radiation doses to accurately quantify the fault tolerance of the fabricated designs. This work analyzes the Trikarenos design, a SoC implemented in TSMC 28nm, for single event upset (SEU) vulnerability under atmospheric neutron and 200 MeV proton radiation, comparing these results to simulation-based fault injection. All faults in error correction codes (ECC) protected memory are corrected by a scrubber, showing an estimated cross-section per bit of up to cm bit. Furthermore, the triple-core lockstep (TCLS) mechanism implemented in Trikarenos is validated and is shown to correct errors affecting a cross-section up to cm, with the remaining uncorrectable vulnerability below cm. When augmenting the experimental analysis of fabricated chips with gate-level fault injection in simulation, 99.10 % of injections into the SoC produced correct results, while 100 % of injections in the TCLS-protected cores were handled correctly. With 12.28 % of all injected faults leading to a TCLS recovery, this indicates an approximate effective flip-flop cross-section of up to cm/FF.
Paper Structure (33 sections, 2 equations, 5 figures, 5 tables)

This paper contains 33 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Trikarenos soc architecture, including (a) the overall system block diagram, (b) the tcls mechanism for core fault tolerance, and (c) the sram error correction and scrubbing system. The design is optimized for fault tolerance in automotive and space applications.
  • Figure 2: Annotated die shot of Trikarenos, highlighting the spatial distribution of the three Ibex cores, memory banks, interconnect, and key peripherals within the 2mm die.
  • Figure 3: Area and ff breakdown for the Trikarenos soc. The three Ibex cores and memory banks are protected, while the remaining components remain unprotected.
  • Figure 4: Annotated image of testing setup at HollandPTC (top) and schematic of the testing setup (bottom), showing the proton beam, the dut, and the test harness.
  • Figure 5: Visualization of induced errors and flux rates during both radiation tests of Trikarenos. We show the cross-sections, flux rates, tcls events, and system crashes over time. We use an atmospheric-like neutron beam cazzaniga_progress_2018 and 200M protons. For the proton tests, no tid-related degradation was observed.