Table of Contents
Fetching ...

Rhea: a Framework for Fast Design and Validation of RTL Cache-Coherent Memory Subsystems

Davide Zoni, Andrea Galimberti, Adriano Guarisco

TL;DR

Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems and demonstrates Rhea’s effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Abstract

Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Rhea: a Framework for Fast Design and Validation of RTL Cache-Coherent Memory Subsystems

TL;DR

Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems and demonstrates Rhea’s effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Abstract

Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Paper Structure

This paper contains 15 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: High-level flow of Rhea framework.
  • Figure 2: Generic RTL cache-coherent memory subsystem generated by Rhea. The number N of L1 caches and the number K of L2 caches are parameters configured at design time. L2 caches, denoted by dashed lines, are optionally enabled.
  • Figure 3: Detailed architecture of the RTL cache-coherent memory subsystem.
  • Figure 4: Detailed overview of the integrated gem5-Verilator RTL/system-level simulator.
  • Figure 5: Speedup. i.e., normalized execution time in the simulated system, with different applications, numbers of cores, and cache-coherent memory subsystems. The system is configured with two, four, eight, or sixteen cores and with either gem5 Ruby MI, MESI, or MOESI cache coherence or the RTL MSI single- and two-level cache-coherent memory subsystems. Time is normalized with respect to the execution with gem5 Ruby's MI coherence protocol.
  • ...and 1 more figures