A 950 MHz SIMT Soft Processor
Martin Langhammer, Gregg Baeckler, Kim Bozman
TL;DR
This work targets bridging the gap between FPGA potential and practical soft-processor throughput by developing a 32-bit fixed-point SIMT soft processor able to run at near-$1$ GHz on Agilex-7 devices. Building on the eGPU architecture, it introduces substantial enhancements to the instruction fetch/decode/sequencer and the ALU, including a $32 imes32$ INT multiplier built from DSP blocks and an integrated shifter designed to maintain throughput in multi-SP configurations. Key results show unconstrained compilations reaching $984$ MHz and constrained layouts sustaining above $950$ MHz, with multi-core systems achieving around $854$–$927$ MHz depending on packing density and clock-network slack. The findings demonstrate a repeatable path to high-speed FPGA design, emphasize the importance of device-aware placement and routing, and point to future work on multi-processor integration and component-level constraint strategies to push toward true GHz-scale performance in FPGA-based accelerators.
Abstract
Although modern FPGAs have a performance potential of a 1 GHz clock frequency - with both clock networks and embedded blocks such as memories and DSP Blocks capable of these clock rates - user implementations approaching this speed are rarely realized in practice. This is especially true of complex designs such as soft processors. In this work we implement a soft GPGPU which exceeds 950 MHz in an Altera Agilex-7 FPGA. The architecture is a 32-bit fixed point Single Instruction, Multiple Thread (SIMT) design, with parameterized thread and register spaces. Up to 4096 threads and 64K registers can be specified by the user. In one example, a processor with 16K registers and a 16KB shared memory required approximately 7K ALMs, 99 M20K memories, and 32 DSP Blocks.
