An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection
Ignacio Mariano Andreozzi Pofcher, Joshua Ellul
TL;DR
This study investigates fine-tuning small language models (1–3B parameters) for Solidity reentrancy vulnerability detection using LoRA and a synthetic data pipeline, aiming for on-device, privacy-preserving analysis. It compares LLaMA 3B and Qwen2.5Coder 3B architectures, evaluates against a robust, modernization-aware dataset that combines synthetic patterns and real-world exploits, and analyzes performance, errors, and architectural factors. The results show substantial gains from fine-tuning (up to 19 percentage points for LLaMA and 14 for Qwen) with an 8% cross-architecture gap in accuracy, and reveal that a larger 14B model only modestly outperforms the small models at significantly higher computational cost. The work highlights the importance of architectural choices, data diversity, and uncertainty handling, and suggests future directions including model scaling, ensembles, and explicit uncertainty quantification for practical vulnerability detection workflows.
Abstract
Large Language Models (LLMs) are being used more and more for various coding tasks, including to help coders identify bugs and are a promising avenue to support coders in various tasks including vulnerability detection -- particularly given the flexibility of such generative AI models and tools. Yet for many tasks it may not be suitable to use LLMs, for which it may be more suitable to use smaller language models that can fit and easily execute and train on a developer's computer. In this paper we explore and evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection -- specifically focusing on detecting the reentrancy bug in Solidity smart contracts.
