Table of Contents
Fetching ...

Wildcat: Educational RISC-V Microprocessors

Martin Schoeberl

TL;DR

The paper questions the traditional dominance of the five-stage RISC pipeline in education and demonstrates that simpler 3-stage, and even 4-stage, designs can match or exceed the clock frequency of longer pipelines while reducing resource usage. Wildcat implements three in-order RISC-V pipelines in Chisel (3-, 4-, and 5-stage), accompanied by a RV32I ISA simulator, and evaluates them on two FPGA families and the SkyWater130 open-source ASIC flow. The results show that the 3-stage design often achieves higher fmax than longer designs due to a shorter critical path in the execution stage, with memory-based register files further enhancing performance; ASIC results corroborate the FPGA findings. The work provides three educational cores, directly compares pipeline lengths by the same author in the same language, and makes the designs openly available to support teaching and practical exploration of RISC-V implementations in FPGA/ASIC contexts.

Abstract

In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes and for implementing embedded processors in FPGAs and ASICs. We analyze resource costs and maximum clock frequency of various designs implemented in an FPGA, using clock frequency as a performance proxy. Additionally, we validate these results with ASIC designs synthesized using the open-source SkyWater130 process. Contradictory to common wisdom, a longer pipeline (up to 5 stages) does not necessarily always increase the maximum clock frequency. In two FPGA and one ASIC implementation, we discovered that a four- or five-stage pipeline leads to a slower clock frequency than a three-stage implementation. The reason is that the width of the forwarding multiplexer in the execution stage increases with longer pipelines, which is on the critical path. We also argue that a 3-stage pipeline organization is more adequate for teaching a pipeline organization of a microprocessor.

Wildcat: Educational RISC-V Microprocessors

TL;DR

The paper questions the traditional dominance of the five-stage RISC pipeline in education and demonstrates that simpler 3-stage, and even 4-stage, designs can match or exceed the clock frequency of longer pipelines while reducing resource usage. Wildcat implements three in-order RISC-V pipelines in Chisel (3-, 4-, and 5-stage), accompanied by a RV32I ISA simulator, and evaluates them on two FPGA families and the SkyWater130 open-source ASIC flow. The results show that the 3-stage design often achieves higher fmax than longer designs due to a shorter critical path in the execution stage, with memory-based register files further enhancing performance; ASIC results corroborate the FPGA findings. The work provides three educational cores, directly compares pipeline lengths by the same author in the same language, and makes the designs openly available to support teaching and practical exploration of RISC-V implementations in FPGA/ASIC contexts.

Abstract

In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes and for implementing embedded processors in FPGAs and ASICs. We analyze resource costs and maximum clock frequency of various designs implemented in an FPGA, using clock frequency as a performance proxy. Additionally, we validate these results with ASIC designs synthesized using the open-source SkyWater130 process. Contradictory to common wisdom, a longer pipeline (up to 5 stages) does not necessarily always increase the maximum clock frequency. In two FPGA and one ASIC implementation, we discovered that a four- or five-stage pipeline leads to a slower clock frequency than a three-stage implementation. The reason is that the width of the forwarding multiplexer in the execution stage increases with longer pipelines, which is on the critical path. We also argue that a 3-stage pipeline organization is more adequate for teaching a pipeline organization of a microprocessor.

Paper Structure

This paper contains 17 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: A textbook style 5-stages RISC-V processor pipeline.
  • Figure 2: A 3-stage RISC-V processor pipeline.