An end-to-end agentic pipeline for smart contract translation and quality evaluation
Abhinav Goel, Chaitya Shah, Agostino Capponi, Alfio Gliozzo
TL;DR
The paper presents an end-to-end agentic pipeline that converts natural-language smart contract specifications into Solidity code and a structured evaluation, enabling automated security checks and provenance tracking. It combines multi-agent orchestration, FSM-based verification, and a five-dimensional quality rubric to produce ground-truth–comparable artifacts and reproducible benchmarks. Empirical results on the FSM-SCG benchmark show an average composite quality around 81.5 with strong compilation rates, and a notable improvement through an automated security-refinement loop that reduces medium+/critical issues. The findings demonstrate the practicality of scalable, interactive AI-assisted contract generation while highlighting limitations in complex specifications and the need for further verification, compliance, and gas-cost considerations.
Abstract
We present an end-to-end framework for systematic evaluation of LLM-generated smart contracts from natural-language specifications. The system parses contractual text into structured schemas, generates Solidity code, and performs automated quality assessment through compilation and security checks. Using CrewAI-style agent teams with iterative refinement, the pipeline produces structured artifacts with full provenance metadata. Quality is measured across five dimensions, including functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality aggregated into composite scores. The framework supports paired evaluation against ground-truth implementations, quantifying alignment and identifying systematic error modes such as logic omissions and state transition inconsistencies. This provides a reproducible benchmark for empirical research on smart contract synthesis quality and supports extensions to formal verification and compliance checking.
