NATLM: Detecting Defects in NFT Smart Contracts Leveraging LLM
Yuanzheng Niu, Xiaoqi Li, Wenkai Li
TL;DR
The paper tackles NFT smart contract security by combining static analysis with a capable LLM (Gemini Pro 1.5) to detect four NFT-specific defects. NATLM builds a knowledge base from AST and CFG features and retrieves relevant defect embeddings via a vector database before applying deep semantic reasoning in an LLM, complemented by a weighted loss and confidence thresholding. On 8,672 NFT contracts, NATLM delivers high precision (87.72%), recall (89.58%), and F1 (88.94%), outperforming traditional static-analysis tools and standalone LLM baselines. This hybrid approach offers scalable, accurate NFT vulnerability detection with practical implications for audits and secure NFT deployments.
Abstract
Security issues are becoming increasingly significant with the rapid evolution of Non-fungible Tokens (NFTs). As NFTs are traded as digital assets, they have emerged as prime targets for cyber attackers. In the development of NFT smart contracts, there may exist undiscovered defects that could lead to substantial financial losses if exploited. To tackle this issue, this paper presents a framework called NATLM(NFT Assistant LLM), designed to detect potential defects in NFT smart contracts. The framework effectively identifies four common types of vulnerabilities in NFT smart contracts: ERC-721 Reentrancy, Public Burn, Risky Mutable Proxy, and Unlimited Minting. Relying exclusively on large language models (LLMs) for defect detection can lead to a high false-positive rate. To enhance detection performance, NATLM integrates static analysis with LLMs, specifically Gemini Pro 1.5. Initially, NATLM employs static analysis to extract structural, syntactic, and execution flow information from the code, represented through Abstract Syntax Trees (AST) and Control Flow Graphs (CFG). These extracted features are then combined with vectors of known defect examples to create a matrix for input into the knowledge base. Subsequently, the feature vectors and code vectors of the analyzed contract are compared with the contents of the knowledge base. Finally, the LLM performs deep semantic analysis to enhance detection capabilities, providing a more comprehensive and accurate identification of potential security issues. Experimental results indicate that NATLM analyzed 8,672 collected NFT smart contracts, achieving an overall precision of 87.72%, a recall of 89.58%, and an F1 score of 88.94%. The results outperform other baseline experiments, successfully identifying four common types of defects.
