SplitSim: Large-Scale Simulations for Evaluating Network Systems Research
Hejing Li, Praneeth Balasubramanian, Marvin Meiers, Jialin Li, Antoine Kaufmann
TL;DR
SplitSim tackles the challenge of evaluating large-scale networked systems when physical testbeds are infeasible by delivering end-to-end simulations at scale. It combines mixed-fidelity modeling, decomposition-based parallelization, lightweight synchronization profiling, and a Python-based orchestration framework to connect heterogeneous simulators via modular adapters. Across case studies (in-network processing, clock synchronization, and congestion control) the framework demonstrates substantial resource savings and close fidelity to full end-to-end simulations, enabling 20s of simulated time in under four hours on a single machine. This approach lowers the barrier to robust, end-to-end evaluation, and positions SplitSim as a practical, extensible tool for researchers and practitioners alike.
Abstract
When physical testbeds are out of reach for evaluating a networked system, we frequently turn to simulation. In today's datacenter networks, bottlenecks are rarely at the network protocol level, but instead in end-host software or hardware components, thus current protocol-level simulations are inadequate means of evaluation. End-to-end simulations covering these components on the other hand, simply cannot achieve the required scale with feasible simulation performance and computational resources. In this paper, we address this with SplitSim, a simulation framework for end-to-end evaluation for large-scale network and distributed systems. To this end, SplitSim builds on prior work on modular end-to-end simulations and combines this with key elements to achieve scalability. First, mixed fidelity simulations judiciously reduce detail in simulation of parts of the system where this can be tolerated, while retaining the necessary detail elsewhere. SplitSim then parallelizes bottleneck simulators by decomposing them into multiple parallel but synchronized processes. Next, SplitSim provides a profiler to help users understand simulation performance and where the bottlenecks are, so users can adjust the configuration. Finally SplitSim provides abstractions to make it easy for users to build complex large-scale simulations. Our evaluation demonstrates SplitSim in multiple large-scale case studies.
