Table of Contents
Fetching ...

Leveraging Large Language Models for Automated Reproduction of Networking Research Results

Yining Jiang, Yunxin Xu, Wenyun Xu, Yufan Zhu, Tangtang He, Haiying Huang, Letian Zhu, Qingyu Song, Qiang Su, Lizhao You, Lu Tang, Wanjin Feng, Yuchao Zhang, Linghe Kong, Qiao Xiang, Jiwu Shu

TL;DR

This work tackles the reproducibility crisis in networking research by introducing RepLLM, a memory-backed, multi-agent system that converts academic papers into executable networking code. By decomposing the task into Content Parsing, Architecture Design, Code Generation, and Audit & Repair, RepLLM achieves robust paper-to-code synthesis through explicit context sharing and a sandboxed, iterative refinement loop. Empirical results across multiple conferences show that RepLLM outperforms baselines in code reliability and semantic alignment, enabling high-fidelity reproduction with limited human intervention. The framework significantly lowers reproduction costs and provides a scalable path toward transparent, reproducible networking research.

Abstract

Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity of heterogeneous system architectures. While Large Language Models have demonstrated potential in code generation, existing code generation frameworks often fail to address the long-context constraints and intricate logical dependencies required to reproduce network systems from academic papers. To facilitate result reproduction, we introduce \emph{RepLLM}, an end-to-end multi-agent framework designed to automate the transformation of network research into executable code. RepLLM features a novel collaborative architecture comprising four specialized agents -- Content Parsing, Architecture Design, Code Generation, and Audit \& Repair -- coordinated through an explicit \textit{Shared Memory} mechanism to ensure global context consistency. With the enhancement of Chain-of-Thought LLM reasoning and a sandbox-isolated static-dynamic debugging methodology, our framework effectively resolves semantic discrepancies and runtime errors. Extensive evaluations on representative papers from SIGCOMM and NSDI demonstrate that RepLLM significantly outperforms state-of-the-art baselines in generating compile-ready and logically correct systems. Results further demonstrate that RepLLM facilitates the reproduction of 80\% of the original benchmarks with only four hours of human intervention.

Leveraging Large Language Models for Automated Reproduction of Networking Research Results

TL;DR

This work tackles the reproducibility crisis in networking research by introducing RepLLM, a memory-backed, multi-agent system that converts academic papers into executable networking code. By decomposing the task into Content Parsing, Architecture Design, Code Generation, and Audit & Repair, RepLLM achieves robust paper-to-code synthesis through explicit context sharing and a sandboxed, iterative refinement loop. Empirical results across multiple conferences show that RepLLM outperforms baselines in code reliability and semantic alignment, enabling high-fidelity reproduction with limited human intervention. The framework significantly lowers reproduction costs and provides a scalable path toward transparent, reproducible networking research.

Abstract

Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity of heterogeneous system architectures. While Large Language Models have demonstrated potential in code generation, existing code generation frameworks often fail to address the long-context constraints and intricate logical dependencies required to reproduce network systems from academic papers. To facilitate result reproduction, we introduce \emph{RepLLM}, an end-to-end multi-agent framework designed to automate the transformation of network research into executable code. RepLLM features a novel collaborative architecture comprising four specialized agents -- Content Parsing, Architecture Design, Code Generation, and Audit \& Repair -- coordinated through an explicit \textit{Shared Memory} mechanism to ensure global context consistency. With the enhancement of Chain-of-Thought LLM reasoning and a sandbox-isolated static-dynamic debugging methodology, our framework effectively resolves semantic discrepancies and runtime errors. Extensive evaluations on representative papers from SIGCOMM and NSDI demonstrate that RepLLM significantly outperforms state-of-the-art baselines in generating compile-ready and logically correct systems. Results further demonstrate that RepLLM facilitates the reproduction of 80\% of the original benchmarks with only four hours of human intervention.

Paper Structure

This paper contains 71 sections, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Statistics of SIGCOMM and NSDI Papers with an Open-Source Prototype from the Authors (2016-2025).
  • Figure 2: Overview of RepLLM's Multi-Agent Architecture.
  • Figure 3: Code Generation Token Cost of Different Frameworks.
  • Figure 4: Workflow of RepLLM
  • Figure 5: Cumulative Distribution Function of Relative Total Flow of Ours v.s. Official NCFlow
  • ...and 3 more figures