Efficient Trace for RISC-V: Design, Evaluation, and Integration in CVA6
Umberto Laghi, Simone Manoni, Emanuele Parisi, Andrea Bartolini
TL;DR
The paper tackles the challenge of obtaining fine-grained, low-overhead program execution traces on edge RISC-V platforms. It presents a Tracing System (TS) that is compliant with the RISC-V Efficient Trace (E-Trace) specification and integrates it into the CVA6-based Shaheen edge platform, enabling block-based trace encoding and reconstruction. The core contribution is a Trace Encoder (TE) architecture with modules te_filter, te_priority, and te_packet_emitter, plus supporting components (te_reg, te_resync_counter, te_branch_map) to produce E-trace packets from per-block discontinuities, including replication to handle multiple retirements per cycle and an AXI4/TRACE path via a TIP. Empirical results on a Xilinx VCU118 FPGA show ~9.2% area overhead on the CVA6 subsystem and ~10% of CVA6 core area, with an average trace compression of 95.1% across benchmarks and no impact on the core's critical path, demonstrating the approach's practicality for edge deployments. The work also outlines future directions including broader benchmarks, TE feature enhancements, and open-source release of the implementation.
Abstract
In this work, we present the design and evaluation of a Processor Tracing System compliant with the RISC-V Efficient Trace specification for Instruction Branch Tracing. We integrate our system into the host domain of a state-of-the-art edge architecture based on CVA6. The proposed Tracing System introduces a total overhead of 9.2% in terms of resource utilization on a Xilinx VCU118 FPGA on the CVA6 subsystem while achieving an average compression rate of 95.1% on platform-specific tests, compared to tracing each full opcode instruction.
