Reproducibility and Standardization in gem5 Resources v25.0
Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power
TL;DR
Reproducibility in simulation-based computer architecture research is challenged by inconsistent artifact sharing, ad hoc disk-image workflows, and tightly coupled guest-host control. The authors introduce a Packer-based, ISA-spanning disk-image workflow with validated base images and pre-annotated benchmarks, alongside a class-based exit-event system and hypercalls for decoupled host control. They further provide gem5-bridge for user-space m5 operations and Suite/MultiSim for parallel experiment orchestration, all designed to run from gem5 configuration scripts. Validation demonstrates consistent benchmark annotations and successful cross-ISA execution, including upstream ARM boot fixes. Collectively, these contributions streamline setup, standardize execution, and enable scalable, reproducible gem5 studies with centralized, extensible resources across x86, ARM, and RISC-V.
Abstract
Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent. We improve gem5, an open-source simulator with over 1600 forks, and gem5 Resources, a centralized repository of over 2000 pre-packaged artifacts, to address these issues. While gem5 Resources enables artifact sharing, researchers still face challenges. Creating custom disk images is complex and time-consuming, with no standardized process across ISAs, making it difficult to extend and share images. gem5 provides limited guest-host communication features through a set of predefined exit events that restrict researchers' ability to dynamically control and monitor simulations. Lastly, running simulations with multiple workloads requires researchers to write custom external scripts to coordinate multiple gem5 simulations which creates error-prone and hard-to-reproduce workflows. To overcome this, we introduce several features in gem5 and gem5 Resources. We standardize disk-image creation across x86, ARM, and RISC-V using Packer, and provide validated base images with pre-annotated benchmark suites (NPB, GAPBS). We provide 12 new disk images, 6 new kernels, and over 200 workloads across three ISAs. We refactor the exit event system to a class-based model and introduce hypercalls for enhanced guest-host communication that allows researchers to define custom behavior for their exit events. We also provide a utility to remotely monitor simulations and the gem5-bridge driver for user-space m5 operations. Additionally, we implemented Suites and MultiSim to enable parallel full-system simulations from gem5 configuration scripts, eliminating the need for external scripting. These features reduce setup complexity and provide extensible, validated resources that improve reproducibility and standardization.
