Table of Contents
Fetching ...

CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V Processor

Christopher Reinwardt, Robert Balas, Alessandro Ottaviano, Angelo Garofalo, Luca Benini

TL;DR

This work tackles the challenge of producing time-predictable execution on a 64-bit RISC-V core under virtualization by addressing interference in virtual memory. It introduces CVA6-VMRT, which adds hardware support for per-thread TLB partitioning and locking, plus a hybrid L1 cache/SPM that can be dynamically configured to favor critical tasks. The approach yields up to 94% reductions in execution-time variability for critical guests under interference, with only about 3.7%–4% area overhead and no timing penalty in synthesis at 16nm. The results demonstrate a practical pathway to time-predictable mixed-criticality automotive systems without the large hardware burden of full physical isolation or software-managed memory schemes.

Abstract

The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks requires minimizing interference on shared resources like virtual memory. Existing techniques, such as software coloring and memory replication, introduce significant area and performance overhead, especially with virtualized memory where address translation adds latency uncertainty. To address these limitations, we propose CVA6-VMRT, an extension of the open-source RISC-V CVA6 core, adding hardware support for predictability in virtual memory access with minimal area overhead. CVA6-VMRT features dynamically partitioned Translation Look-aside Buffers (TLBs) and hybrid L1 cache/scratchpad memory (SPM) functionality. It allows fine-grained per-thread control of resources, enabling the operating system to manage TLB replacements, including static overwrites, to ensure single-cycle address translation for critical memory regions. Additionally, CVA6-VMRT enables runtime partitioning of data and instruction caches into cache and SPM sections, providing low and predictable access times for critical data without impacting other accesses. In a virtualized setting, CVA6-VMRT enhances execution time determinism for critical guests by 94% during interference from non-critical guests, with minimal impact on their average absolute execution time compared to isolated execution of the critical guests only. This interference-aware behaviour is achieved with just a 4% area overhead and no timing penalty compared to the baseline CVA6 core.

CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V Processor

TL;DR

This work tackles the challenge of producing time-predictable execution on a 64-bit RISC-V core under virtualization by addressing interference in virtual memory. It introduces CVA6-VMRT, which adds hardware support for per-thread TLB partitioning and locking, plus a hybrid L1 cache/SPM that can be dynamically configured to favor critical tasks. The approach yields up to 94% reductions in execution-time variability for critical guests under interference, with only about 3.7%–4% area overhead and no timing penalty in synthesis at 16nm. The results demonstrate a practical pathway to time-predictable mixed-criticality automotive systems without the large hardware burden of full physical isolation or software-managed memory schemes.

Abstract

The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks requires minimizing interference on shared resources like virtual memory. Existing techniques, such as software coloring and memory replication, introduce significant area and performance overhead, especially with virtualized memory where address translation adds latency uncertainty. To address these limitations, we propose CVA6-VMRT, an extension of the open-source RISC-V CVA6 core, adding hardware support for predictability in virtual memory access with minimal area overhead. CVA6-VMRT features dynamically partitioned Translation Look-aside Buffers (TLBs) and hybrid L1 cache/scratchpad memory (SPM) functionality. It allows fine-grained per-thread control of resources, enabling the operating system to manage TLB replacements, including static overwrites, to ensure single-cycle address translation for critical memory regions. Additionally, CVA6-VMRT enables runtime partitioning of data and instruction caches into cache and SPM sections, providing low and predictable access times for critical data without impacting other accesses. In a virtualized setting, CVA6-VMRT enhances execution time determinism for critical guests by 94% during interference from non-critical guests, with minimal impact on their average absolute execution time compared to isolated execution of the critical guests only. This interference-aware behaviour is achieved with just a 4% area overhead and no timing penalty compared to the baseline CVA6 core.

Paper Structure

This paper contains 20 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: CVA6-VMRT MMU and hybrid cache/SPM subsystem. Modified architectural blocks are highlighted in red.
  • Figure 2: PLRU behavior example for an eight-entry TLB.
  • Figure 3: Cheshire microarchitecture and software stack.
  • Figure 4: Box plots for the synthetic benchmark using different interference mitigation configurations. Without our extensions, locking achieves a standard deviation reduction of 60% compared to the unmitigated case. Using additionally to locking further decreases the standard deviation, bringing the total reduction to 91%.
  • Figure 5: Results of the powerwindow benchmark for varying levels of enabled interference mitigations. Combining partitioning and locking with the functionality CVA6-VMRT achieves a 94% reduction in execution time standard deviation over the unmitigated case.
  • ...and 1 more figures