Table of Contents
Fetching ...

SoliDiffy: AST Differencing for Solidity Smart Contracts

Mojtaba Eshghie, Viktor Åryd, Cyrille Artho, Martin Monperrus

TL;DR

The paper tackles the problem that line-based diffs fail to capture structural changes in Solidity smart contracts, hindering vulnerability detection and automated repair. It introduces SoliDiffy, a Solidity-specific AST differencing tool that uses pruning rules and GumTree-based mappings to produce concise, four-action edit scripts (insert, delete, update, move). Empirically, it achieves a 96.1% diffing success on 353,262 contract pairs and yields shorter, more meaningful edit scripts than a text-based baseline, with strong performance on complex changes and real-world commits (Uniswap v4) versus Git diffs. The work provides a practical, open-source solution that improves understanding of contract evolution and supports automated analysis tasks, while outlining avenues for hybrid approaches and broader applicability.

Abstract

Structured code differencing is the act of comparing the hierarchical structure of code via its abstract syntax tree (AST) to capture modifications. AST-based source code differencing enables tasks such as vulnerability detection and automated repair where traditional line-based differencing falls short. We introduce SoliDiffy, the first AST differencing tool for Solidity smart contracts with the ability to generate an edit script that soundly shows the structural differences between two smart-contracts using insert, delete, update, move operations. In our evaluation on 353,262 contract pairs, SoliDiffy achieved a 96.1% diffing success rate, surpassing the state-of-the-art, and produced significantly shorter edit scripts. Additional experiments on 925 real-world commits further confirmed its superiority compared to Git line-based differencing. SoliDiffy provides accurate representations of smart contract evolution even in the existence of multiple complex modifications to the source code. SoliDiffy is made publicly available at https://github.com/mojtaba-eshghie/SoliDiffy.

SoliDiffy: AST Differencing for Solidity Smart Contracts

TL;DR

The paper tackles the problem that line-based diffs fail to capture structural changes in Solidity smart contracts, hindering vulnerability detection and automated repair. It introduces SoliDiffy, a Solidity-specific AST differencing tool that uses pruning rules and GumTree-based mappings to produce concise, four-action edit scripts (insert, delete, update, move). Empirically, it achieves a 96.1% diffing success on 353,262 contract pairs and yields shorter, more meaningful edit scripts than a text-based baseline, with strong performance on complex changes and real-world commits (Uniswap v4) versus Git diffs. The work provides a practical, open-source solution that improves understanding of contract evolution and supports automated analysis tasks, while outlining avenues for hybrid approaches and broader applicability.

Abstract

Structured code differencing is the act of comparing the hierarchical structure of code via its abstract syntax tree (AST) to capture modifications. AST-based source code differencing enables tasks such as vulnerability detection and automated repair where traditional line-based differencing falls short. We introduce SoliDiffy, the first AST differencing tool for Solidity smart contracts with the ability to generate an edit script that soundly shows the structural differences between two smart-contracts using insert, delete, update, move operations. In our evaluation on 353,262 contract pairs, SoliDiffy achieved a 96.1% diffing success rate, surpassing the state-of-the-art, and produced significantly shorter edit scripts. Additional experiments on 925 real-world commits further confirmed its superiority compared to Git line-based differencing. SoliDiffy provides accurate representations of smart contract evolution even in the existence of multiple complex modifications to the source code. SoliDiffy is made publicly available at https://github.com/mojtaba-eshghie/SoliDiffy.

Paper Structure

This paper contains 36 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Standard line diff of original and modified SimpleStorage contract, with added and removed lines highlighted.
  • Figure 2: The design of the SoliDiffy smart contract differencing tool.
  • Figure 3: Edit script generated by SoliDiffy as a result of diffing task of Figure \ref{['fig:comparison_example']}.
  • Figure 4: Pipeline for large-scale generation of contract pair for differencing and subsequent analysis.
  • Figure 5: RQ1: Histogram of mean edit script lengths per project for SoliDiffy and Difftastic across all diff pairs of the project ($n=336331.0$). The long tail of Difftastic's distribution is trimmed at 100.0 to fit the plot as it continues to more than 500.0. Each pair of bars represents the frequency of projects falling within specific mean edit distance ranges.
  • ...and 3 more figures