Table of Contents
Fetching ...

Security and RAS in the Computing Continuum

Martí Alonso, David Andreu, Ramon Canal, Stefano Di Carlo, Odysseas Chatzopoulos, Cristiano Chenet, Juanjo Costa, Andreu Girones, Dimitris Gizopoulos, George Papadimitriou, Enric Morancho, Beatriz Otero, Alessandro Savino

TL;DR

The paper tackles pervasive security and reliability (RAS) challenges in the computing continuum, emphasizing AI-driven detection for malware and hardware attacks on open RISC-V platforms. It proposes a dual-source methodology combining Hardware Performance Counters (dynamic) and static opcode analysis to train ML models, plus a RAS framework for large-scale SoCs with simulation- and gate-level validation to address silent data corruptions. Key findings show high accuracy for supervised malware detection, promising unsupervised SBO detection, and ISA-dependent SDC vulnerabilities—RISC-V often more prone in L1 data caches without ECC. The work highlights the practical importance of integrating robust RAS features and cross-platform AI security in future edge-to-cloud systems, aligning with the Vitamin-V goals for secure, reliable computing continuum architectures.

Abstract

Security and RAS are two non-functional requirements under focus for current systems developed for the computing continuum. Due to the increased number of interconnected computer systems across the continuum, security becomes especially pervasive at all levels, from the smallest edge device to the high-performance cloud at the other end. Similarly, RAS (Reliability, Availability, and Serviceability) ensures the robustness of a system towards hardware defects. Namely, making them reliable, with high availability and design for easy service. In this paper and as a result of the Vitamin-V EU project, the authors detail the comprehensive approach to malware and hardware attack detection; as well as, the RAS features envisioned for future systems across the computing continuum.

Security and RAS in the Computing Continuum

TL;DR

The paper tackles pervasive security and reliability (RAS) challenges in the computing continuum, emphasizing AI-driven detection for malware and hardware attacks on open RISC-V platforms. It proposes a dual-source methodology combining Hardware Performance Counters (dynamic) and static opcode analysis to train ML models, plus a RAS framework for large-scale SoCs with simulation- and gate-level validation to address silent data corruptions. Key findings show high accuracy for supervised malware detection, promising unsupervised SBO detection, and ISA-dependent SDC vulnerabilities—RISC-V often more prone in L1 data caches without ECC. The work highlights the practical importance of integrating robust RAS features and cross-platform AI security in future edge-to-cloud systems, aligning with the Vitamin-V goals for secure, reliable computing continuum architectures.

Abstract

Security and RAS are two non-functional requirements under focus for current systems developed for the computing continuum. Due to the increased number of interconnected computer systems across the continuum, security becomes especially pervasive at all levels, from the smallest edge device to the high-performance cloud at the other end. Similarly, RAS (Reliability, Availability, and Serviceability) ensures the robustness of a system towards hardware defects. Namely, making them reliable, with high availability and design for easy service. In this paper and as a result of the Vitamin-V EU project, the authors detail the comprehensive approach to malware and hardware attack detection; as well as, the RAS features envisioned for future systems across the computing continuum.

Paper Structure

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Confusion matrices for detecting malware using a balanced data set (left), a benign dataset (center) and a malign dataset (right).
  • Figure 2: Accuracy of unsupervised SBO detection for different benchmarks and classifiers. The malicious function runs a number of instructions lower than 1% of those of the original application.).
  • Figure 3: SDC probability due to permanent faults in L1 instruction cache 10567770.
  • Figure 4: SDC probability due to permanent faults in L1 data cache 10567770.