Enhancing Code Consistency in AI Research with Large Language Models and Retrieval-Augmented Generation
Rajat Keshri, Arun George Zachariah, Michael Boone
TL;DR
This paper addresses reproducibility gaps in AI research by automating verification of code implementations against descriptions in research papers using Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). The proposed system extracts and aligns model architectures, hyperparameters, algorithms, and data processing steps from papers and code, and provides a structured discrepancy report. By storing paper and code representations in separate vector stores and performing targeted retrieval, the approach reduces manual verification effort and increases transparency and reproducibility. The work has practical impact for researchers, reviewers, and conferences by providing a scalable, automated tool for validating code integrity against published methodologies.
Abstract
Ensuring that code accurately reflects the algorithms and methods described in research papers is critical for maintaining credibility and fostering trust in AI research. This paper presents a novel system designed to verify code implementations against the algorithms and methodologies outlined in corresponding research papers. Our system employs Retrieval-Augmented Generation to extract relevant details from both the research papers and code bases, followed by a structured comparison using Large Language Models. This approach improves the accuracy and comprehensiveness of code implementation verification while contributing to the transparency, explainability, and reproducibility of AI research. By automating the verification process, our system reduces manual effort, enhances research credibility, and ultimately advances the state of the art in code verification.
