Table of Contents
Fetching ...

NETREPLICA: Toward a Programmable Substrate for Last-Mile Data Generation

Jaber Daneshamooz, Satyandra Guthula, Jessica Nguyen, William Chen, Sanjay Chandrasekaran, Ankit Gupta, Arpit Gupta, Walter Willinger

TL;DR

NETREPLICA presents a programmable substrate for last-mile data generation that decouples static bottleneck attributes from dynamic cross-traffic via Cross-Traffic Profiles, enabling realistic, tunable, diverse, composable, and replicable data. The design combines three abstractions (Link, Bottleneck, CrossTraffic) with a four-stage CTP pipeline and a hybrid replay model across multiple backends, achieving fidelity while expanding coverage beyond production traces. Empirical results in ABR domains show up to 47% improvement in modeling accuracy when augmenting production data with NETREPLICA traces, and analyses demonstrate strong replicability and scalable data generation. Overall, NETREPLICA constitutes a practical first step toward a fully programmable data-generation substrate for networking, with significant implications for reproducible research, fair benchmarking, and robust machine-learning model training in last-mile networking.

Abstract

Last-mile access networks are often the dominant bottlenecks for Internet applications, creating demand for data-generation approaches that are both realistic and reusable. Meeting this goal requires five properties: fidelity (capturing real network behaviors), controllability (systematic variation of network conditions), diversity (coverage of heterogeneous network behaviors), composability (construction of complex scenarios from simpler elements), and replicability (consistent outcomes across runs). Existing approaches satisfy only a subset of these requirements. This paper introduces NETREPLICA, a programmable substrate for last-mile data generation that achieves all five. NETREPLICA decomposes bottlenecks into static attributes (capacity, base latency, buffer size, shaping and active queue management policies) and dynamic attributes derived from passive traces. It introduces Cross-Traffic Profiles (CTPs) that transform passive production traces into reusable, parameterizable building blocks. By trimming, scaling, and recombining CTPs, NETREPLICA generates realistic yet tunable conditions, replaying non-reactive cross traffic alongside reactive application workloads and enabling reproducible construction of heterogeneous scenarios. In a case study on adaptive bitrate streaming, models trained with NETREPLICA-generated traces reduced transmission-time prediction error by up to 47% in challenging slow-path domains (>=400 ms RTT, <=6 Mbps throughput) compared to models trained solely on production traces -- demonstrating the utility of NETREPLICA-generated data. Overall, NETREPLICA represents a first step toward a fully programmable data-generation substrate for networking.

NETREPLICA: Toward a Programmable Substrate for Last-Mile Data Generation

TL;DR

NETREPLICA presents a programmable substrate for last-mile data generation that decouples static bottleneck attributes from dynamic cross-traffic via Cross-Traffic Profiles, enabling realistic, tunable, diverse, composable, and replicable data. The design combines three abstractions (Link, Bottleneck, CrossTraffic) with a four-stage CTP pipeline and a hybrid replay model across multiple backends, achieving fidelity while expanding coverage beyond production traces. Empirical results in ABR domains show up to 47% improvement in modeling accuracy when augmenting production data with NETREPLICA traces, and analyses demonstrate strong replicability and scalable data generation. Overall, NETREPLICA constitutes a practical first step toward a fully programmable data-generation substrate for networking, with significant implications for reproducible research, fair benchmarking, and robust machine-learning model training in last-mile networking.

Abstract

Last-mile access networks are often the dominant bottlenecks for Internet applications, creating demand for data-generation approaches that are both realistic and reusable. Meeting this goal requires five properties: fidelity (capturing real network behaviors), controllability (systematic variation of network conditions), diversity (coverage of heterogeneous network behaviors), composability (construction of complex scenarios from simpler elements), and replicability (consistent outcomes across runs). Existing approaches satisfy only a subset of these requirements. This paper introduces NETREPLICA, a programmable substrate for last-mile data generation that achieves all five. NETREPLICA decomposes bottlenecks into static attributes (capacity, base latency, buffer size, shaping and active queue management policies) and dynamic attributes derived from passive traces. It introduces Cross-Traffic Profiles (CTPs) that transform passive production traces into reusable, parameterizable building blocks. By trimming, scaling, and recombining CTPs, NETREPLICA generates realistic yet tunable conditions, replaying non-reactive cross traffic alongside reactive application workloads and enabling reproducible construction of heterogeneous scenarios. In a case study on adaptive bitrate streaming, models trained with NETREPLICA-generated traces reduced transmission-time prediction error by up to 47% in challenging slow-path domains (>=400 ms RTT, <=6 Mbps throughput) compared to models trained solely on production traces -- demonstrating the utility of NETREPLICA-generated data. Overall, NETREPLICA represents a first step toward a fully programmable data-generation substrate for networking.

Paper Structure

This paper contains 29 sections, 18 figures, 4 tables.

Figures (18)

  • Figure 1: NetReplica realizes a data-generation thin waist for last-mile networks. It provides programmable interfaces to express diverse data-generation intents, represent last-mile networks as abstract topologies composed of one or more bottleneck links, decouple static and dynamic bottleneck attributes, and map these specifications onto one or more physical/virtual network infrastructures.
  • Figure 2: Effect of targeted data generation on the performance of the model: performance improvement with NetReplica-generated training data.
  • Figure 3: Prototype implementation of NetReplica.
  • Figure 4: Characteristics of cross-traffic profiles synthesized from the 15-minute packet traces collected at the campus network gateway router.
  • Figure 5: Visualization of throughput of four cross-traffic profiles with different intensity, burstiness, and heterogeneity.
  • ...and 13 more figures