CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers

Ekaterina Trofimova; Emil Sataev; Abhijit Singh Jowhari

CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers

Ekaterina Trofimova, Emil Sataev, Abhijit Singh Jowhari

TL;DR

CodeRefine targets the problem of converting methodological descriptions in scientific papers into executable code by combining LLM-driven processing with a knowledge-graph ontology and a retrospective retrieval-augmented generation framework. The pipeline segments papers into text chunks, filters for code-relevant content, builds a knowledge graph, generates intermediate code with GPT-4o, and refines it via RRAG using task-aware vector embeddings; final outputs are evaluated against ground-truth code using a Tree-based Structural Edit Distance metric defined as $TSED = \max\{1- \frac{TED}{MaxNodes(T_1, T_2)}, 0\}$. Experiments on five papers show that RRAG, when supplied with the paper and its references in a dynamic database, improves code similarity compared to vanilla prompting, though penalty weights for code edits are paper-dependent and not universal. This approach offers a practical step toward reliable automated code synthesis from scientific text, with potential impact on accelerating the adoption of cutting-edge algorithms and guiding future tool development in research workflows.

Abstract

This paper presents CodeRefine, a novel framework for automatically transforming research paper methodologies into functional code using Large Language Models (LLMs). Our multi-step approach first extracts and summarizes key text chunks from papers, analyzes their code relevance, and creates a knowledge graph using a predefined ontology. Code is then generated from this structured representation and enhanced through a proposed retrospective retrieval-augmented generation approach. CodeRefine addresses the challenge of bridging theoretical research and practical implementation, offering a more accurate alternative to LLM zero-shot prompting. Evaluations on diverse scientific papers demonstrate CodeRefine's ability to improve code implementation from the paper, potentially accelerating the adoption of cutting-edge algorithms in real-world applications.

CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers

TL;DR

. Experiments on five papers show that RRAG, when supplied with the paper and its references in a dynamic database, improves code similarity compared to vanilla prompting, though penalty weights for code edits are paper-dependent and not universal. This approach offers a practical step toward reliable automated code synthesis from scientific text, with potential impact on accelerating the adoption of cutting-edge algorithms and guiding future tool development in research workflows.

Abstract

Paper Structure (15 sections, 2 equations, 3 figures, 3 tables)

This paper contains 15 sections, 2 equations, 3 figures, 3 tables.

Introduction
Related Works
Metric
Methodology
Experiments and Results
Task-Aware Vectorizer
TSED penalty weight optimization
The penalty weights are not unique
Limitations
Downstream tasks
Conclusion
Acknowledgments.
Disclosure of Interests.
Ontology for the Knowledge Graph Creation
Test papers

Figures (3)

Figure 1: CodeRefine pipeline scheme: Input paper is processed through Llama3-70B, Ontology, and GPT-4o to generate the final code.
Figure 2: Retrospective Retrieval Augmented Generation scheme.
Figure 3: Research helper utilizing the aforementioned dataset.

CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers

TL;DR

Abstract

CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)