Switchboard: An Open-Source Framework for Modular Simulation of Large Hardware Systems
Steven Herbst, Noah Moroze, Edgar Iglesias, Andreas Olofsson
TL;DR
Switchboard tackles the bottleneck of scaling hardware system simulations by proposing a modular design: represent a system as repeated, reasonably-sized blocks connected via latency-insensitive interfaces, each block simulated with a prebuilt, block-local model. The framework uses high-performance shared-memory queues and Verilog bridges to connect RTL, FPGA, and software models at runtime, enabling large-scale, distributed simulations across cloud resources without heavy global synchronization. Key contributions include the Switchboard framework itself, a fast queue implementation, Verilog bridges, a Python API with autowrap, and networked deployment modes (Single-Netlist, Network-of-Networks, Ethernet bridging) plus mixed-signal support via SPICE wrappers. This approach delivers dramatic scalability, demonstrated by a million-core RISC-V wafer-scale simulation on thousands of cloud cores, with substantial build/run-time speedups compared to monolithic RTL or traditional Verilator runs, and interactive web-based chiplet exploration enabling rapid design iteration and prototyping.
Abstract
Scaling up hardware systems has become an important tactic for improving performance as Moore's law fades. Unfortunately, simulations of large hardware systems are often a design bottleneck due to slow throughput and long build times. In this article, we propose a solution targeting designs composed of modular blocks connected by latency-insensitive interfaces. Our approach is to construct the hardware simulation in a similar fashion as the design itself, using a prebuilt simulator for each block and connecting the simulators via fast shared-memory queues at runtime. This improves build time, because simulation scale-up simply involves running more instances of the prebuilt simulators. It also addresses simulation speed, because prebuilt simulators can run in parallel, without fine-grained synchronization or global barriers. We introduce a framework, Switchboard, that implements our approach, and discuss two applications, demonstrating its speed, scalability, and accuracy: (1) a web application where users can run fast simulations of chiplets on an interposer, and (2) a wafer-scale simulation of one million RISC-V cores distributed across thousands of cloud compute cores.
