Resilient and Secure Programmable System-on-Chip Accelerator Offload
Inês Pinto Gouveia, Ahmad T. Sheikh, Ali Shoker, Suhaib A. Fahmy, Paulo Esteves-Verissimo
TL;DR
Samsara addresses computing integrity in programmable MPSoCs by combining hardware-based state-machine replication with tile rejuvenation. It introduces H-Quorum, a lightweight hardware Byzantine fault-tolerant protocol that relies on a trusted Controller to mediate requests and perform fast state transfer, tolerating up to $f$ faulty tiles with a majority of $2f+1$ replicas. Empirically, Samsara achieves latency close to non-replicated accelerators and up to $35.9\%$ faster than prior hardware-based BFT schemes, while rejuvenation incurs negligible latency relative to full reboots. This work demonstrates a practical, low-overhead path to secure and resilient on-chip accelerators for CPS/IoT and similar domains, leveraging partial reconfiguration and diverse bitstreams to improve fault independence.
Abstract
Computational offload to hardware accelerators is gaining traction due to increasing computational demands and efficiency challenges. Programmable hardware, like FPGAs, offers a promising platform in rapidly evolving application areas, with the benefits of hardware acceleration and software programmability. Unfortunately, such systems composed of multiple hardware components must consider integrity in the case of malicious components. In this work, we propose Samsara, the first secure and resilient platform that derives, from Byzantine Fault Tolerant (BFT), protocols to enhance the computing resilience of programmable hardware. Samsara uses a novel lightweight hardware-based BFT protocol for Systems-on-Chip, called H-Quorum, that implements the theoretical-minimum latency between applications and replicated compute nodes. To withstand malicious behaviors, Samsara supports hardware rejuvenation, which is used to replace, relocate, or diversify faulty compute nodes. Samsara's architecture ensures the security of the entire workflow while keeping the latency overhead, of both computation and rejuvenation, close to the non-replicated counterpart.
