Table of Contents
Fetching ...

An Operational Semantics for Yul

Vasileios Koutavas, Yu-Yang Lin, Nikos Tzevelekos

TL;DR

This work delivers a rigorous, tool-agnostic operational semantics for Yul, the Solidity compiler's intermediate representation, in both big-step and small-step forms and proves their equivalence. It provides a modular interpreter (YulTracer) parameterised by dialects (illustrated on a Shanghai-EVM subset) and validates the approach against Solidity tests and custom benchmarks. By clarifying under-specified aspects such as void function calls and multi-value returns, the paper establishes a foundation for formal verification and symbolic execution directly in Yul, with implications for security and correctness in Ethereum smart contracts. The results enable verifiable optimisations, guide future type-system work, and aim to unify disparate formalizations under a standard mathematical model.

Abstract

We present a big-step and small-step operational semantics for Yul -- the intermediate language used by the Solidity compiler to produce EVM bytecode -- in a mathematical notation that is congruous with the literature of programming languages, lends itself to language proofs, and can serve as a precise, widely accessible specification for the language. Our two semantics stay faithful to the original, informal specification of the language but also clarify under-specified cases such as void function calls. Our presentation allows us to prove the equivalence between the two semantics. We also implement the small-step semantics in an interpreter for Yul which avails of optimisations that are provably correct. We have tested the interpreter using tests from the Solidity compiler and our own. We envisage that this work will enable the development of verification and symbolic execution technology directly in Yul, contributing to the Ethereum security ecosystem, as well as aid the development of a provably sound future type system.

An Operational Semantics for Yul

TL;DR

This work delivers a rigorous, tool-agnostic operational semantics for Yul, the Solidity compiler's intermediate representation, in both big-step and small-step forms and proves their equivalence. It provides a modular interpreter (YulTracer) parameterised by dialects (illustrated on a Shanghai-EVM subset) and validates the approach against Solidity tests and custom benchmarks. By clarifying under-specified aspects such as void function calls and multi-value returns, the paper establishes a foundation for formal verification and symbolic execution directly in Yul, with implications for security and correctness in Ethereum smart contracts. The results enable verifiable optimisations, guide future type-system work, and aim to unify disparate formalizations under a standard mathematical model.

Abstract

We present a big-step and small-step operational semantics for Yul -- the intermediate language used by the Solidity compiler to produce EVM bytecode -- in a mathematical notation that is congruous with the literature of programming languages, lends itself to language proofs, and can serve as a precise, widely accessible specification for the language. Our two semantics stay faithful to the original, informal specification of the language but also clarify under-specified cases such as void function calls. Our presentation allows us to prove the equivalence between the two semantics. We also implement the small-step semantics in an interpreter for Yul which avails of optimisations that are provably correct. We have tested the interpreter using tests from the Solidity compiler and our own. We envisage that this work will enable the development of verification and symbolic execution technology directly in Yul, contributing to the Ethereum security ecosystem, as well as aid the development of a provably sound future type system.
Paper Structure (19 sections, 11 theorems, 35 equations, 7 figures)

This paper contains 19 sections, 11 theorems, 35 equations, 7 figures.

Key Result

lemma 1

Given any evaluation context $E$ and a reduction sequence $\ma{\langle {S} \mid G \mathop{;} L \mathop{;} \ma{\mathcal{N}} \rangle} \to^* \ma{\langle {S'} \mid G' \mathop{;} L' \mathop{;} \ma{\mathcal{N}}' \rangle}$ then it is the case that:

Figures (7)

  • Figure 1: Yul source-level syntax.
  • Figure 2: Yul operational syntax (omitting constructs shown in \ref{['fig:yul']}).
  • Figure 3: Big-step (top) and small-step (bottom) block semantics.
  • Figure 4: Big-step (top) and small-step (bottom) declaration semantics. Similar rules for assignments (e.g. $\ma{\vec{x}\ma{\textup{:=}}\tuple {\vec{v}}}$, with $\vec{x}\in\dom{L}$) omitted for economy.
  • Figure 5: Big-step (top) and small-step (bottom) branching semantics. For switch rules we assume side conditions: $C_i = \ma{\mathsf{case}}\,c_i S_i$ and $v,c_{n+1} \not\in \{c_1 ,..,c_n\}$.
  • ...and 2 more figures

Theorems & Definitions (28)

  • remark 1: Typing
  • remark 2: Dialects
  • definition 1: Big-step semantics
  • remark 3
  • definition 2: Small-step semantics
  • remark 4: Block optimisation
  • remark 5: Break optimisation
  • remark 6: Void return and modes
  • lemma 1: Congruence
  • proof
  • ...and 18 more