Table of Contents
Fetching ...

RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers

Vineet Bhat, Shiqing Wei, Ali Umut Kaypak, Prashanth Krishnamurthy, Ramesh Karri, Farshad Khorrami

Abstract

Reconstructing numerical simulations from control systems research papers is often hindered by underspecified parameters and ambiguous implementation details. We define the task of Paper to Simulation Recoverability, the ability of an automated system to generate executable code that faithfully reproduces a paper's results. We curate a benchmark of 500 papers from the IEEE Conference on Decision and Control (CDC) and propose RESCORE, a three component LLM agentic framework, Analyzer, Coder, and Verifier. RESCORE uses iterative execution feedback and visual comparison to improve reconstruction fidelity. Our method successfully recovers task coherent simulations for 40.7% of benchmark instances, outperforming single pass generation. Notably, the RESCORE automated pipeline achieves an estimated 10X speedup over manual human replication, drastically cutting the time and effort required to verify published control methodologies. We will release our benchmark and agents to foster community progress in automated research replication.

RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers

Abstract

Reconstructing numerical simulations from control systems research papers is often hindered by underspecified parameters and ambiguous implementation details. We define the task of Paper to Simulation Recoverability, the ability of an automated system to generate executable code that faithfully reproduces a paper's results. We curate a benchmark of 500 papers from the IEEE Conference on Decision and Control (CDC) and propose RESCORE, a three component LLM agentic framework, Analyzer, Coder, and Verifier. RESCORE uses iterative execution feedback and visual comparison to improve reconstruction fidelity. Our method successfully recovers task coherent simulations for 40.7% of benchmark instances, outperforming single pass generation. Notably, the RESCORE automated pipeline achieves an estimated 10X speedup over manual human replication, drastically cutting the time and effort required to verify published control methodologies. We will release our benchmark and agents to foster community progress in automated research replication.

Paper Structure

This paper contains 23 sections, 9 figures, 4 tables.

Figures (9)

  • Figure B1: RESCORE framework automates code recovery from control system papers. After filtering, expert screening, and annotation, Analyzer, Coder, and Verifier LLM agents operate in a closed loop to generate, execute, and refine simulation code using feedback, followed by evaluation.
  • Figure B2: Prompt for Analyzer Agent. The agent performs two tasks: transcribe red-boxed equations into readable format, analyze and describe behavior of the system using simulations from the paper. problem_statement, params and init_conditions are defined by domain expert during annotation.
  • Figure B3: Prompt for Coder Agent. The agent performs both initial code generation and iterative code repair. In repair mode, the current code and the feedback agent's visual diagnosis are appended to the user message.
  • Figure B4: Prompt for Verifier Agent: The agent receives the generated and target plots side-by-side. A confirmed match terminates the loop; otherwise a structured diagnosis feeds back into the Coder Agent for the next iteration.
  • Figure C1: Confusion matrices: Human raters vs. LLM grader on RESCORE outputs. Off-diagonal mass above diagonal shows LLM optimism.
  • ...and 4 more figures