Table of Contents
Fetching ...

An end-to-end agentic pipeline for smart contract translation and quality evaluation

Abhinav Goel, Chaitya Shah, Agostino Capponi, Alfio Gliozzo

TL;DR

The paper presents an end-to-end agentic pipeline that converts natural-language smart contract specifications into Solidity code and a structured evaluation, enabling automated security checks and provenance tracking. It combines multi-agent orchestration, FSM-based verification, and a five-dimensional quality rubric to produce ground-truth–comparable artifacts and reproducible benchmarks. Empirical results on the FSM-SCG benchmark show an average composite quality around 81.5 with strong compilation rates, and a notable improvement through an automated security-refinement loop that reduces medium+/critical issues. The findings demonstrate the practicality of scalable, interactive AI-assisted contract generation while highlighting limitations in complex specifications and the need for further verification, compliance, and gas-cost considerations.

Abstract

We present an end-to-end framework for systematic evaluation of LLM-generated smart contracts from natural-language specifications. The system parses contractual text into structured schemas, generates Solidity code, and performs automated quality assessment through compilation and security checks. Using CrewAI-style agent teams with iterative refinement, the pipeline produces structured artifacts with full provenance metadata. Quality is measured across five dimensions, including functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality aggregated into composite scores. The framework supports paired evaluation against ground-truth implementations, quantifying alignment and identifying systematic error modes such as logic omissions and state transition inconsistencies. This provides a reproducible benchmark for empirical research on smart contract synthesis quality and supports extensions to formal verification and compliance checking.

An end-to-end agentic pipeline for smart contract translation and quality evaluation

TL;DR

The paper presents an end-to-end agentic pipeline that converts natural-language smart contract specifications into Solidity code and a structured evaluation, enabling automated security checks and provenance tracking. It combines multi-agent orchestration, FSM-based verification, and a five-dimensional quality rubric to produce ground-truth–comparable artifacts and reproducible benchmarks. Empirical results on the FSM-SCG benchmark show an average composite quality around 81.5 with strong compilation rates, and a notable improvement through an automated security-refinement loop that reduces medium+/critical issues. The findings demonstrate the practicality of scalable, interactive AI-assisted contract generation while highlighting limitations in complex specifications and the need for further verification, compliance, and gas-cost considerations.

Abstract

We present an end-to-end framework for systematic evaluation of LLM-generated smart contracts from natural-language specifications. The system parses contractual text into structured schemas, generates Solidity code, and performs automated quality assessment through compilation and security checks. Using CrewAI-style agent teams with iterative refinement, the pipeline produces structured artifacts with full provenance metadata. Quality is measured across five dimensions, including functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality aggregated into composite scores. The framework supports paired evaluation against ground-truth implementations, quantifying alignment and identifying systematic error modes such as logic omissions and state transition inconsistencies. This provides a reproducible benchmark for empirical research on smart contract synthesis quality and supports extensions to formal verification and compliance checking.
Paper Structure (47 sections, 2 equations, 4 figures, 11 tables)

This paper contains 47 sections, 2 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Single Contract Mode Interface
  • Figure 2: End-to-end agentic pipeline architecture for smart contract generation, testing and deployment.
  • Figure 3: Grade Distribution for Generated Smart Contracts (N=9,000). The pipeline produced predominantly B-grade contracts (66.4%), with 7.3% achieving A-grade excellence and only 2.2% failing completely.
  • Figure 4: Detailed Metric Comparison: Generated vs. Ground Truth. Generated contracts (blue) consistently outperform ground truth (orange) across all dimensions.