NETREPLICA: Toward a Programmable Substrate for Last-Mile Data Generation
Jaber Daneshamooz, Satyandra Guthula, Jessica Nguyen, William Chen, Sanjay Chandrasekaran, Ankit Gupta, Arpit Gupta, Walter Willinger
TL;DR
NETREPLICA presents a programmable substrate for last-mile data generation that decouples static bottleneck attributes from dynamic cross-traffic via Cross-Traffic Profiles, enabling realistic, tunable, diverse, composable, and replicable data. The design combines three abstractions (Link, Bottleneck, CrossTraffic) with a four-stage CTP pipeline and a hybrid replay model across multiple backends, achieving fidelity while expanding coverage beyond production traces. Empirical results in ABR domains show up to 47% improvement in modeling accuracy when augmenting production data with NETREPLICA traces, and analyses demonstrate strong replicability and scalable data generation. Overall, NETREPLICA constitutes a practical first step toward a fully programmable data-generation substrate for networking, with significant implications for reproducible research, fair benchmarking, and robust machine-learning model training in last-mile networking.
Abstract
Last-mile access networks are often the dominant bottlenecks for Internet applications, creating demand for data-generation approaches that are both realistic and reusable. Meeting this goal requires five properties: fidelity (capturing real network behaviors), controllability (systematic variation of network conditions), diversity (coverage of heterogeneous network behaviors), composability (construction of complex scenarios from simpler elements), and replicability (consistent outcomes across runs). Existing approaches satisfy only a subset of these requirements. This paper introduces NETREPLICA, a programmable substrate for last-mile data generation that achieves all five. NETREPLICA decomposes bottlenecks into static attributes (capacity, base latency, buffer size, shaping and active queue management policies) and dynamic attributes derived from passive traces. It introduces Cross-Traffic Profiles (CTPs) that transform passive production traces into reusable, parameterizable building blocks. By trimming, scaling, and recombining CTPs, NETREPLICA generates realistic yet tunable conditions, replaying non-reactive cross traffic alongside reactive application workloads and enabling reproducible construction of heterogeneous scenarios. In a case study on adaptive bitrate streaming, models trained with NETREPLICA-generated traces reduced transmission-time prediction error by up to 47% in challenging slow-path domains (>=400 ms RTT, <=6 Mbps throughput) compared to models trained solely on production traces -- demonstrating the utility of NETREPLICA-generated data. Overall, NETREPLICA represents a first step toward a fully programmable data-generation substrate for networking.
