Table of Contents
Fetching ...

Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection

Jeffy Yu

TL;DR

This work investigates democratizing smart-contract security auditing by integrating Retrieval-Augmented Generation (RAG) with large-language models (LLMs), specifically GPT-4-1106, to detect vulnerabilities in DeFi contracts. The authors build a 830-contract vulnerability vector store using Pinecone and OpenAI embeddings, and evaluate the RAG-LLM pipeline under guided and blind prompts using a two-phase experimental design. Phase One (guided) achieves 62.7% accuracy, while Phase Two (blind) achieves 60.71%, demonstrating promising generalization but also highlighting variability and the continued need for human review. The study argues that RAG-LLMs can lower auditing costs and broaden access, while emphasizing limitations related to data integrity, prompt compliance, context handling, and ethical considerations, charting a path for scalable, responsible deployment in real-world DeFi security workflows.

Abstract

The rapid growth of Decentralized Finance (DeFi) has been accompanied by substantial financial losses due to smart contract vulnerabilities, underscoring the critical need for effective security auditing. With attacks becoming more frequent, the necessity and demand for auditing services has escalated. This especially creates a financial burden for independent developers and small businesses, who often have limited available funding for these services. Our study builds upon existing frameworks by integrating Retrieval-Augmented Generation (RAG) with large language models (LLMs), specifically employing GPT-4-1106 for its 128k token context window. We construct a vector store of 830 known vulnerable contracts, leveraging Pinecone for vector storage, OpenAI's text-embedding-ada-002 for embeddings, and LangChain to construct the RAG-LLM pipeline. Prompts were designed to provide a binary answer for vulnerability detection. We first test 52 smart contracts 40 times each against a provided vulnerability type, verifying the replicability and consistency of the RAG-LLM. Encouraging results were observed, with a 62.7% success rate in guided detection of vulnerabilities. Second, we challenge the model under a "blind" audit setup, without the vulnerability type provided in the prompt, wherein 219 contracts undergo 40 tests each. This setup evaluates the general vulnerability detection capabilities without hinted context assistance. Under these conditions, a 60.71% success rate was observed. While the results are promising, we still emphasize the need for human auditing at this time. We provide this study as a proof of concept for a cost-effective smart contract auditing process, moving towards democratic access to security.

Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection

TL;DR

This work investigates democratizing smart-contract security auditing by integrating Retrieval-Augmented Generation (RAG) with large-language models (LLMs), specifically GPT-4-1106, to detect vulnerabilities in DeFi contracts. The authors build a 830-contract vulnerability vector store using Pinecone and OpenAI embeddings, and evaluate the RAG-LLM pipeline under guided and blind prompts using a two-phase experimental design. Phase One (guided) achieves 62.7% accuracy, while Phase Two (blind) achieves 60.71%, demonstrating promising generalization but also highlighting variability and the continued need for human review. The study argues that RAG-LLMs can lower auditing costs and broaden access, while emphasizing limitations related to data integrity, prompt compliance, context handling, and ethical considerations, charting a path for scalable, responsible deployment in real-world DeFi security workflows.

Abstract

The rapid growth of Decentralized Finance (DeFi) has been accompanied by substantial financial losses due to smart contract vulnerabilities, underscoring the critical need for effective security auditing. With attacks becoming more frequent, the necessity and demand for auditing services has escalated. This especially creates a financial burden for independent developers and small businesses, who often have limited available funding for these services. Our study builds upon existing frameworks by integrating Retrieval-Augmented Generation (RAG) with large language models (LLMs), specifically employing GPT-4-1106 for its 128k token context window. We construct a vector store of 830 known vulnerable contracts, leveraging Pinecone for vector storage, OpenAI's text-embedding-ada-002 for embeddings, and LangChain to construct the RAG-LLM pipeline. Prompts were designed to provide a binary answer for vulnerability detection. We first test 52 smart contracts 40 times each against a provided vulnerability type, verifying the replicability and consistency of the RAG-LLM. Encouraging results were observed, with a 62.7% success rate in guided detection of vulnerabilities. Second, we challenge the model under a "blind" audit setup, without the vulnerability type provided in the prompt, wherein 219 contracts undergo 40 tests each. This setup evaluates the general vulnerability detection capabilities without hinted context assistance. Under these conditions, a 60.71% success rate was observed. While the results are promising, we still emphasize the need for human auditing at this time. We provide this study as a proof of concept for a cost-effective smart contract auditing process, moving towards democratic access to security.
Paper Structure (34 sections, 3 figures, 2 tables)

This paper contains 34 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: RAG LLM Pipeline. Adapted from "How to Connect LLM to External Sources Using RAG?" by Rajeev Sharma, https://markovate.com/blog/connect-llm-using-rag/.
  • Figure 2: Phase one prompt template used for RAG with LangChain
  • Figure 3: Phase One Results, individual smart contract efficacy. See Appendix Table 3 for corresponding smart contract.