A Public Dataset For the ZKsync Rollup
Maria Inês Silva, Johnnatan Messias, Benjamin Livshits
TL;DR
This paper addresses the data accessibility bottleneck for blockchain research, particularly within ZKsync Era, by delivering a public, curated 1-year dataset of L2 activity. It details the dataset scope (Feb 14, 2023 to Mar 24, 2024), its schema comprising blocks, transactions, receipts, logs, and L2→L1 messages, and the parquet-based preprocessing with Polars to enable efficient local analysis. The authors provide example analyses on gas usage, event deployments, and swaps to demonstrate practical research workflows and reproducibility, and discuss future directions such as MEV/arbitrage, governance, and cross-rollup studies. Overall, the dataset and accompanying notebooks aim tolower barriers for researchers, enabling data-driven exploration of ZKsync Era and broader L2 scaling ecosystems with reproducible methodologies and public code.
Abstract
Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer-2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. We provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions.
