Table of Contents
Fetching ...

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

Xiaoqi Li, Yingjie Mao, Zexin Lu, Wenkai Li, Zongwei Li

TL;DR

SCLA introduces a control-flow–aware framework for automated smart contract summarization by embedding CFG-derived semantics into LLM prompts. It combines SemFlow-based control-flow extraction with SBERT-driven semantic retrieval to craft semantically enriched, few-shot prompts that guide LLMs to produce more accurate and secure summaries. Across Solidity, Java, and Python datasets, SCLA achieves substantial gains over state-of-the-art baselines in BLEU-4, METEOR, ROUGE-L, and BLEURT, and its ablation, human evaluation, and generalization studies corroborate the robustness of the approach. The work demonstrates the practical potential of integrating structural code analysis with LLM prompting to improve code understanding and vulnerability mitigation in real-world smart contracts.

Abstract

Smart contract code summarization is crucial for efficient maintenance and vulnerability mitigation. While many studies use Large Language Models (LLMs) for summarization, their performance still falls short compared to fine-tuned models like CodeT5+ and CodeBERT. Some approaches combine LLMs with data flow analysis but fail to fully capture the hierarchy and control structures of the code, leading to information loss and degraded summarization quality. We propose SCLA, an LLM-based method that enhances summarization by integrating a Control Flow Graph (CFG) and semantic facts from the code's control flow into a semantically enriched prompt. SCLA uses a control flow extraction algorithm to derive control flows from semantic nodes in the Abstract Syntax Tree (AST) and constructs the corresponding CFG. Code semantic facts refer to both explicit and implicit information within the AST that is relevant to smart contracts. This method enables LLMs to better capture the structural and contextual dependencies of the code. We validate the effectiveness of SCLA through comprehensive experiments on a dataset of 40,000 real-world smart contracts. The experiment shows that SCLA significantly improves summarization quality, outperforming the SOTA baselines with improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT scores, respectively.

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

TL;DR

SCLA introduces a control-flow–aware framework for automated smart contract summarization by embedding CFG-derived semantics into LLM prompts. It combines SemFlow-based control-flow extraction with SBERT-driven semantic retrieval to craft semantically enriched, few-shot prompts that guide LLMs to produce more accurate and secure summaries. Across Solidity, Java, and Python datasets, SCLA achieves substantial gains over state-of-the-art baselines in BLEU-4, METEOR, ROUGE-L, and BLEURT, and its ablation, human evaluation, and generalization studies corroborate the robustness of the approach. The work demonstrates the practical potential of integrating structural code analysis with LLM prompting to improve code understanding and vulnerability mitigation in real-world smart contracts.

Abstract

Smart contract code summarization is crucial for efficient maintenance and vulnerability mitigation. While many studies use Large Language Models (LLMs) for summarization, their performance still falls short compared to fine-tuned models like CodeT5+ and CodeBERT. Some approaches combine LLMs with data flow analysis but fail to fully capture the hierarchy and control structures of the code, leading to information loss and degraded summarization quality. We propose SCLA, an LLM-based method that enhances summarization by integrating a Control Flow Graph (CFG) and semantic facts from the code's control flow into a semantically enriched prompt. SCLA uses a control flow extraction algorithm to derive control flows from semantic nodes in the Abstract Syntax Tree (AST) and constructs the corresponding CFG. Code semantic facts refer to both explicit and implicit information within the AST that is relevant to smart contracts. This method enables LLMs to better capture the structural and contextual dependencies of the code. We validate the effectiveness of SCLA through comprehensive experiments on a dataset of 40,000 real-world smart contracts. The experiment shows that SCLA significantly improves summarization quality, outperforming the SOTA baselines with improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT scores, respectively.
Paper Structure (18 sections, 3 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 3 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of our proposed framework, SCLA, powered by Google's Gemini-1.5-Pro, performs automated generation of smart contract code summarization. SCLA extracts control flow semantic facts from smart contract code and uses Gemini-1.5-Pro to generate code summarization from control flow semantic facts.
  • Figure 2: An Example of Control Flow Prompt.
  • Figure 3: The Comparison of BLEU, METEOR, and ROUGE-L Scores on Our Test Set Under Five Different LLMs, Using the SCCLLM and the Proposed SCLA for Zero-Shot Summarization Tasks.
  • Figure 4: Human Evaluation Results of 300 Code Summarizations Generated by SCLA and the Baseline.
  • Figure 5: An Example of a Function Call Graph in Which Gemini-1.5-Pro Has Difficulty Understanding the Call Information.