A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

Waleed Khalid; Dmitry Ignatov; Radu Timofte

A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

Waleed Khalid, Dmitry Ignatov, Radu Timofte

TL;DR

NN-RAG tackles the fragmentation of PyTorch code across repositories by introducing a retrieval-augmented pipeline that constructs dependency-closed, executable neural modules with provenance. It emphasizes import-preserving regeneration and validator-gated promotion, using neutral specifications to optionally guide LLM-based synthesis without redistributing code. On 19 repositories, it extracts 1,289 blocks and validates 941 as runnable, uncovering that a majority of unique architectures originate from NN-RAG and enabling cross-repository migration of designs. The approach enhances reproducibility, scalability, and transparency in neural-architecture reuse, providing a practical substrate for ablations and architectural discovery while avoiding the redistribution of third-party weights.

Abstract

Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-preserving reconstruction, and validator-gated promotion -- ensuring that every retrieved block is scope-closed, compilable, and runnable. Applied to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique. Through multi-level de-duplication (exact, lexical, structural), we find that NN-RAG contributes the overwhelming majority of unique architectures to the LEMUR dataset, supplying approximately 72% of all novel network structures. Beyond quantity, NN-RAG uniquely enables cross-repository migration of architectural patterns, automatically identifying reusable modules in one project and regenerating them, dependency-complete, in another context. To our knowledge, no other open-source system provides this capability at scale. The framework's neutral specifications further allow optional integration with language models for synthesis or dataset registration without redistributing third-party code. Overall, NN-RAG transforms fragmented vision code into a reproducible, provenance-tracked substrate for algorithmic discovery, offering a first open-source solution that both quantifies and expands the diversity of executable neural architectures across repositories.

A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

TL;DR

Abstract

A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)