Prompt Engineering vs. Fine-Tuning for LLM-Based Vulnerability Detection in Solana and Algorand Smart Contracts
Biagio Boi, Christian Esposito
TL;DR
This work addresses the challenge of detecting OWASP-inspired vulnerabilities in non-EVM smart contracts by building synthetic Rust (Solana) and PyTeal (Algorand) datasets and evaluating LLM-based vulnerability assessment under prompt engineering, fine-tuning, and a hybrid approach. It reveals that prompt engineering offers robust baselines across platforms, while fine-tuning enhances precision/recall for semantically lean languages such as TEAL, with platform-specific mappings shaping vulnerability coverage. The study maps OWASP Top 10 vulnerabilities to Algorand and Solana architectures, comparing architecture-driven risk surfaces and highlighting the need for cross-platform security tooling. Overall, the results indicate LLM-based vulnerability detection is viable for static analysis in diverse blockchain environments when domain-specific data and taxonomy are incorporated, with practical implications for cross-chain security tooling and auditing workflows.
Abstract
Smart contracts have emerged as key components within decentralized environments, enabling the automation of transactions through self-executing programs. While these innovations offer significant advantages, they also present potential drawbacks if the smart contract code is not carefully designed and implemented. This paper investigates the capability of large language models (LLMs) to detect OWASP-inspired vulnerabilities in smart contracts beyond the Ethereum Virtual Machine (EVM) ecosystem, focusing specifically on Solana and Algorand. Given the lack of labeled datasets for non-EVM platforms, we design a synthetic dataset of annotated smart contract snippets in Rust (for Solana) and PyTeal (for Algorand), structured around a vulnerability taxonomy derived from OWASP. We evaluate LLMs under three configurations: prompt engineering, fine-tuning, and a hybrid of both, comparing their performance on different vulnerability categories. Experimental results show that prompt engineering achieves general robustness, while fine-tuning improves precision and recall on less semantically rich languages such as TEAL. Additionally, we analyze how the architectural differences of Solana and Algorand influence the manifestation and detectability of vulnerabilities, offering platform-specific mappings that highlight limitations in existing security tooling. Our findings suggest that LLM-based approaches are viable for static vulnerability detection in smart contracts, provided domain-specific data and categorization are integrated into training pipelines.
