Table of Contents
Fetching ...

An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection

Ignacio Mariano Andreozzi Pofcher, Joshua Ellul

TL;DR

This study investigates fine-tuning small language models (1–3B parameters) for Solidity reentrancy vulnerability detection using LoRA and a synthetic data pipeline, aiming for on-device, privacy-preserving analysis. It compares LLaMA 3B and Qwen2.5Coder 3B architectures, evaluates against a robust, modernization-aware dataset that combines synthetic patterns and real-world exploits, and analyzes performance, errors, and architectural factors. The results show substantial gains from fine-tuning (up to 19 percentage points for LLaMA and 14 for Qwen) with an 8% cross-architecture gap in accuracy, and reveal that a larger 14B model only modestly outperforms the small models at significantly higher computational cost. The work highlights the importance of architectural choices, data diversity, and uncertainty handling, and suggests future directions including model scaling, ensembles, and explicit uncertainty quantification for practical vulnerability detection workflows.

Abstract

Large Language Models (LLMs) are being used more and more for various coding tasks, including to help coders identify bugs and are a promising avenue to support coders in various tasks including vulnerability detection -- particularly given the flexibility of such generative AI models and tools. Yet for many tasks it may not be suitable to use LLMs, for which it may be more suitable to use smaller language models that can fit and easily execute and train on a developer's computer. In this paper we explore and evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection -- specifically focusing on detecting the reentrancy bug in Solidity smart contracts.

An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection

TL;DR

This study investigates fine-tuning small language models (1–3B parameters) for Solidity reentrancy vulnerability detection using LoRA and a synthetic data pipeline, aiming for on-device, privacy-preserving analysis. It compares LLaMA 3B and Qwen2.5Coder 3B architectures, evaluates against a robust, modernization-aware dataset that combines synthetic patterns and real-world exploits, and analyzes performance, errors, and architectural factors. The results show substantial gains from fine-tuning (up to 19 percentage points for LLaMA and 14 for Qwen) with an 8% cross-architecture gap in accuracy, and reveal that a larger 14B model only modestly outperforms the small models at significantly higher computational cost. The work highlights the importance of architectural choices, data diversity, and uncertainty handling, and suggests future directions including model scaling, ensembles, and explicit uncertainty quantification for practical vulnerability detection workflows.

Abstract

Large Language Models (LLMs) are being used more and more for various coding tasks, including to help coders identify bugs and are a promising avenue to support coders in various tasks including vulnerability detection -- particularly given the flexibility of such generative AI models and tools. Yet for many tasks it may not be suitable to use LLMs, for which it may be more suitable to use smaller language models that can fit and easily execute and train on a developer's computer. In this paper we explore and evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection -- specifically focusing on detecting the reentrancy bug in Solidity smart contracts.

Paper Structure

This paper contains 32 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Single-Function Reentrancy Vulnerability Template
  • Figure 2: Reentrancy Guard Secure Contract Example
  • Figure 3: Model Performance Improvement Through Fine-tuning