Large Language Model based Smart Contract Auditing with LLMBugScanner
Yining Yuan, Yifei Wang, Yichang Xu, Zachary Yahn, Sihao Hu, Ling Liu
TL;DR
The paper tackles unreliable vulnerability detection in smart contracts by introducing LLMBugScanner, which couples domain knowledge adaptation with ensemble reasoning to improve generalization across vulnerability types and code structures. It employs two-stage fine-tuning via LoRA on a broad Ethereum dataset and a CVE-derived instructional set, and combines multiple lightweight LLMs through weighted majority and tie-breaking voting to boost robustness and coverage. Experimental results on a CVE-Solidity benchmark show that finetuned models outperform baselines and that ensembles achieve the highest Top-5 hit rates, with a 60% top-5 accuracy on 108 CVE-labeled contracts and a 19% improvement over single-model baselines. The framework is presented as scalable, cost-effective, and extensible, with future directions including learning-based ensembles, hallucination mitigation, and code normalization to further enhance reliability in real-world smart contract auditing.
Abstract
This paper presents LLMBugScanner, a large language model (LLM) based framework for smart contract vulnerability detection using fine-tuning and ensemble learning. Smart contract auditing presents several challenges for LLMs: different pretrained models exhibit varying reasoning abilities, and no single model performs consistently well across all vulnerability types or contract structures. These limitations persist even after fine-tuning individual LLMs. To address these challenges, LLMBugScanner combines domain knowledge adaptation with ensemble reasoning to improve robustness and generalization. Through domain knowledge adaptation, we fine-tune LLMs on complementary datasets to capture both general code semantics and instruction-guided vulnerability reasoning, using parameter-efficient tuning to reduce computational cost. Through ensemble reasoning, we leverage the complementary strengths of multiple LLMs and apply a consensus-based conflict resolution strategy to produce more reliable vulnerability assessments. We conduct extensive experiments across multiple popular LLMs and compare LLMBugScanner with both pretrained and fine-tuned individual models. Results show that LLMBugScanner achieves consistent accuracy improvements and stronger generalization, demonstrating that it provides a principled, cost-effective, and extensible framework for smart contract auditing.
