SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

Xiaoqi Li; Yingjie Mao; Zexin Lu; Wenkai Li; Zongwei Li

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

Xiaoqi Li, Yingjie Mao, Zexin Lu, Wenkai Li, Zongwei Li

TL;DR

SCLA introduces a control-flow–aware framework for automated smart contract summarization by embedding CFG-derived semantics into LLM prompts. It combines SemFlow-based control-flow extraction with SBERT-driven semantic retrieval to craft semantically enriched, few-shot prompts that guide LLMs to produce more accurate and secure summaries. Across Solidity, Java, and Python datasets, SCLA achieves substantial gains over state-of-the-art baselines in BLEU-4, METEOR, ROUGE-L, and BLEURT, and its ablation, human evaluation, and generalization studies corroborate the robustness of the approach. The work demonstrates the practical potential of integrating structural code analysis with LLM prompting to improve code understanding and vulnerability mitigation in real-world smart contracts.

Abstract

Smart contract code summarization is crucial for efficient maintenance and vulnerability mitigation. While many studies use Large Language Models (LLMs) for summarization, their performance still falls short compared to fine-tuned models like CodeT5+ and CodeBERT. Some approaches combine LLMs with data flow analysis but fail to fully capture the hierarchy and control structures of the code, leading to information loss and degraded summarization quality. We propose SCLA, an LLM-based method that enhances summarization by integrating a Control Flow Graph (CFG) and semantic facts from the code's control flow into a semantically enriched prompt. SCLA uses a control flow extraction algorithm to derive control flows from semantic nodes in the Abstract Syntax Tree (AST) and constructs the corresponding CFG. Code semantic facts refer to both explicit and implicit information within the AST that is relevant to smart contracts. This method enables LLMs to better capture the structural and contextual dependencies of the code. We validate the effectiveness of SCLA through comprehensive experiments on a dataset of 40,000 real-world smart contracts. The experiment shows that SCLA significantly improves summarization quality, outperforming the SOTA baselines with improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT scores, respectively.

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 3 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related Work
METHODOLOGY
Control Flow Prompt
Semantic-based Retrieval
SCLA Framework
Control Flow Extraction
EXPERIMENT
Experiment Settings
DataSet
Baseline
Performance Metrics
Main Results
Ablation Study
Human Evaluation of Summarization Generated by SCLA and the Baseline
...and 3 more sections

Figures (5)

Figure 1: Overview of our proposed framework, SCLA, powered by Google's Gemini-1.5-Pro, performs automated generation of smart contract code summarization. SCLA extracts control flow semantic facts from smart contract code and uses Gemini-1.5-Pro to generate code summarization from control flow semantic facts.
Figure 2: An Example of Control Flow Prompt.
Figure 3: The Comparison of BLEU, METEOR, and ROUGE-L Scores on Our Test Set Under Five Different LLMs, Using the SCCLLM and the Proposed SCLA for Zero-Shot Summarization Tasks.
Figure 4: Human Evaluation Results of 300 Code Summarizations Generated by SCLA and the Baseline.
Figure 5: An Example of a Function Call Graph in Which Gemini-1.5-Pro Has Difficulty Understanding the Call Information.

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

TL;DR

Abstract

SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

Authors

TL;DR

Abstract

Table of Contents

Figures (5)