Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems
Marcelo Orenes-Vera, Esin Tureci, Margaret Martonosi, David Wentzlaff
TL;DR
MuchiSim addresses the challenge of design-space exploration for scale-out, communication-intensive multi-chiplet manycore systems by offering a scalable, cycle-accurate, host-executed simulator that models data movement and network traffic cycle-by-cycle while providing energy, area, and cost estimates. The framework supports various tile-based architectures, memory hierarchies, and inter-chip interconnects, along with multiple parallelization strategies and a benchmark suite to assess performance across diverse DCi workloads. Key contributions include a novel distributed manycore performance model, detailed energy/area/cost models for multi-chip modules and interposers, and visualization tools that aid comparative analysis, all validated against real hardware and demonstrated at large scales. The results show strong validation, linear or near-linear scalability with host threads, and actionable insights for tuning memory, compute, and network resources, making MuchiSim a practical open-source tool for architecture research and design optimization.
Abstract
The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating data-dependent execution patterns. This paper presents MuchiSim, a novel parallel simulator designed to address these challenges when exploring the design space of distributed multi-chiplet manycore architectures. We evaluate MuchiSim at simulating systems with up to a million interconnected processing units (PUs) while modeling data movement and communication cycle by cycle. In addition to performance, MuchiSim reports the energy, area, and cost of the simulated system. It also comes with a benchmark application suite and two data visualization tools. MuchiSim supports various parallelization strategies and communication primitives such as task-based parallelization and message passing, making it highly relevant for architectures with software-managed coherence and distributed memory. Via a case study, we show that MuchiSim helps users explore the balance between memory and computation units and the constraints related to chiplet integration and inter-chip communication. MuchiSim enables evaluating new techniques or design parameters for systems at scales that are more realistic for modern parallel systems, opening the gate for further research in this area.
