Table of Contents
Fetching ...

Design and implementation of a synchronous Hardware Performance Monitor for a RISC-V space-oriented processor

Miguel Jiménez Arribas, Agustín Martínez Hellín, Manuel Prieto Mateo, Iván Gamino del Río, Andrea Fernandez Gallego, Oscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez

TL;DR

The paper addresses the need for precise timing and behavior statistics in space-grade RISC-V processors by introducing a synchronous PMU that decouples event triggering from counting and aligns event increments with instruction retirement. The decentralized event triggering across the pipeline, combined with retirement-synchronized counting, yields accurate per-instruction event attribution and easy extensibility. Validation on a RISC-V OBC using Dhrystone and CoreMark benchmarks, plus cross-platform comparisons, demonstrates correct operation, reproducibility, and a clear execution model, while incurring only modest resource and power overheads. This PMU enables enhanced observability and debugging capabilities for safety-critical onboard software, with practical implications for reliability analyses and future architectural enhancements.

Abstract

The ability to collect statistics about the execution of a program within a CPU is of the utmost importance across all fields of computing since it allows characterizing the timing performance of a program. This capability is even more relevant in safety-critical software systems, where it is mandatory to analyze software timing requirements to ensure the correct operation of the programs. Moreover, in order to properly evaluate and verify the extra-functional properties of these systems, besides timing performance, there are many other statistics available on a CPU, such as those associated with resource utilization. In this paper, we showcase a Performance Measurement Unit, also known as Hardware Performance Monitor, integrated into a RISC-V On-Board Computer designed for space applications by our research group. The monitoring technique features a novel approach whereby the events triggered are not counted immediately but instead are propagated through the pipeline so that their annotation is synchronized with the executed instruction. Additionally, we demonstrate the use of this PMU in a process to characterize the execution model of the processor. Finally, as an example of the statistics provided by the PMU, the results obtained running the CoreMark and Dhrystone benchmarks on the RISC-V OBC are shown.

Design and implementation of a synchronous Hardware Performance Monitor for a RISC-V space-oriented processor

TL;DR

The paper addresses the need for precise timing and behavior statistics in space-grade RISC-V processors by introducing a synchronous PMU that decouples event triggering from counting and aligns event increments with instruction retirement. The decentralized event triggering across the pipeline, combined with retirement-synchronized counting, yields accurate per-instruction event attribution and easy extensibility. Validation on a RISC-V OBC using Dhrystone and CoreMark benchmarks, plus cross-platform comparisons, demonstrates correct operation, reproducibility, and a clear execution model, while incurring only modest resource and power overheads. This PMU enables enhanced observability and debugging capabilities for safety-critical onboard software, with practical implications for reliability analyses and future architectural enhancements.

Abstract

The ability to collect statistics about the execution of a program within a CPU is of the utmost importance across all fields of computing since it allows characterizing the timing performance of a program. This capability is even more relevant in safety-critical software systems, where it is mandatory to analyze software timing requirements to ensure the correct operation of the programs. Moreover, in order to properly evaluate and verify the extra-functional properties of these systems, besides timing performance, there are many other statistics available on a CPU, such as those associated with resource utilization. In this paper, we showcase a Performance Measurement Unit, also known as Hardware Performance Monitor, integrated into a RISC-V On-Board Computer designed for space applications by our research group. The monitoring technique features a novel approach whereby the events triggered are not counted immediately but instead are propagated through the pipeline so that their annotation is synchronized with the executed instruction. Additionally, we demonstrate the use of this PMU in a process to characterize the execution model of the processor. Finally, as an example of the statistics provided by the PMU, the results obtained running the CoreMark and Dhrystone benchmarks on the RISC-V OBC are shown.
Paper Structure (14 sections, 8 figures, 12 tables)

This paper contains 14 sections, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Baseline pipeline structure. The 5 stages mentioned are shown in orange, while in green, the inter-stage registers that synchronize the data transition between them are found. Finally, in blue, other functional units are shown. Specifically, the placement of the general-purpose register file can be seen.
  • Figure 2: This is a simplified abstraction that shows the coupling between each of the five stages of the pipeline and their corresponding inter-stage registers, hence the difference in color coding with the amalgamations in yellow. In addition, it also shows the triggered_events data structure and how it is monitored and chained through the pipeline arriving to the count module where the events are finally added up. This module is an auxiliary functional unit and therefore not an actual stage in the pipeline since it does not produce any effect in the execution of instructions, thus the blue color.
  • Figure 3: Example illustrating the integration into the existing processor of the hazard event detection mechanism and its storage under the proposed design. As can be seen, the signal “Insert hazard bubble” was already necessary so that the control unit knows when to insert a bubble. Therefore, the PMU mechanism monitors existing signals within the design and registers them in the triggered_events data structure. Notably, this process occurs in parallel without introducing sequential logic, ensuring no timing penalty.
  • Figure 4: Here the complete diagram of the PMU is shown. As has been explained, its reach spans throughout the entire pipeline by storing the monitored events in the triggered_events data structure, which ultimately arrives at the CSR unit. Then it is here where the behavior of the PMU is decided with its configuration registers, and where the performance counters are located and finally incremented.
  • Figure 5: Example of the read and write concurrent accesses produced between CSRs and GPRs for the atomic modification of the CSRs.
  • ...and 3 more figures