SplitSim: Large-Scale Simulations for Evaluating Network Systems Research

Hejing Li; Praneeth Balasubramanian; Marvin Meiers; Jialin Li; Antoine Kaufmann

SplitSim: Large-Scale Simulations for Evaluating Network Systems Research

Hejing Li, Praneeth Balasubramanian, Marvin Meiers, Jialin Li, Antoine Kaufmann

TL;DR

SplitSim tackles the challenge of evaluating large-scale networked systems when physical testbeds are infeasible by delivering end-to-end simulations at scale. It combines mixed-fidelity modeling, decomposition-based parallelization, lightweight synchronization profiling, and a Python-based orchestration framework to connect heterogeneous simulators via modular adapters. Across case studies (in-network processing, clock synchronization, and congestion control) the framework demonstrates substantial resource savings and close fidelity to full end-to-end simulations, enabling 20s of simulated time in under four hours on a single machine. This approach lowers the barrier to robust, end-to-end evaluation, and positions SplitSim as a practical, extensible tool for researchers and practitioners alike.

Abstract

When physical testbeds are out of reach for evaluating a networked system, we frequently turn to simulation. In today's datacenter networks, bottlenecks are rarely at the network protocol level, but instead in end-host software or hardware components, thus current protocol-level simulations are inadequate means of evaluation. End-to-end simulations covering these components on the other hand, simply cannot achieve the required scale with feasible simulation performance and computational resources. In this paper, we address this with SplitSim, a simulation framework for end-to-end evaluation for large-scale network and distributed systems. To this end, SplitSim builds on prior work on modular end-to-end simulations and combines this with key elements to achieve scalability. First, mixed fidelity simulations judiciously reduce detail in simulation of parts of the system where this can be tolerated, while retaining the necessary detail elsewhere. SplitSim then parallelizes bottleneck simulators by decomposing them into multiple parallel but synchronized processes. Next, SplitSim provides a profiler to help users understand simulation performance and where the bottlenecks are, so users can adjust the configuration. Finally SplitSim provides abstractions to make it easy for users to build complex large-scale simulations. Our evaluation demonstrates SplitSim in multiple large-scale case studies.

SplitSim: Large-Scale Simulations for Evaluating Network Systems Research

TL;DR

Abstract

Paper Structure (50 sections, 11 figures, 1 table)

This paper contains 50 sections, 11 figures, 1 table.

Introduction
Background and Motivation
Requirements
Existing Simulators Fall Short
Technical Challenges
High resource needs for detailed simulators.
Simulations bottlenecked by slowest component.
Hard to understand simulation performance.
Complex configuration and execution.
Design and Implementation
Mixed-Fidelity Simulations
Reducing Simulation Detail in non-Critical Components.
Enabling Mixed-Fidelity End-to-End Simulations.
New Challenges.
Parallelizing Through Decomposition
...and 35 more sections

Figures (11)

Figure 1: SplitSim overview
Figure 2: Changing an end-to-end simulation into a mixed-fidelity simulation by simulating clients at the protocol-level in ns-3 instead using individual host and NIC simulator instances.
Figure 3: Parallelizing a sequential multicore architecture simulation by splitting it into parallel processes interconnected with SplitSim adapters.
Figure 4: Example of a generated wait-time-profile graph. Here the net.np0 process is the immediate bottleneck, but np1-3 are close behind, judging from their waiting numbers.
Figure 5: Comparing NetCache and Pegasus throughput, with different simulation configurations.
...and 6 more figures

SplitSim: Large-Scale Simulations for Evaluating Network Systems Research

TL;DR

Abstract

SplitSim: Large-Scale Simulations for Evaluating Network Systems Research

Authors

TL;DR

Abstract

Table of Contents

Figures (11)