Table of Contents
Fetching ...

Bitcoin Research with a Transaction Graph Dataset

Hugo Schnoering, Michalis Vazirgiannis

TL;DR

This paper presents a large-scale, temporally annotated graph dataset representing Bitcoin transactions, designed to advance research in blockchain analytics and beyond, and establishes baseline performance using graph neural network models for node classification tasks.

Abstract

Bitcoin, launched in 2008 by Satoshi Nakamoto, established a new digital economy where value can be stored and transferred in a fully decentralized manner - alleviating the need for a central authority. This paper introduces a large scale dataset in the form of a transactions graph representing transactions between Bitcoin users along with a set of tasks and baselines. The graph includes 252 million nodes and 785 million edges, covering a time span of nearly 13 years of and 670 million transactions. Each node and edge is timestamped. As for supervised tasks we provide two labeled sets i. a 33,000 nodes based on entity type and ii. nearly 100,000 Bitcoin addresses labeled with an entity name and an entity type. This is the largest publicly available data set of bitcoin transactions designed to facilitate advanced research and exploration in this domain, overcoming the limitations of existing datasets. Various graph neural network models are trained to predict node labels, establishing a baseline for future research. In addition, several use cases are presented to demonstrate the dataset's applicability beyond Bitcoin analysis. Finally, all data and source code is made publicly available to enable reproducibility of the results.

Bitcoin Research with a Transaction Graph Dataset

TL;DR

This paper presents a large-scale, temporally annotated graph dataset representing Bitcoin transactions, designed to advance research in blockchain analytics and beyond, and establishes baseline performance using graph neural network models for node classification tasks.

Abstract

Bitcoin, launched in 2008 by Satoshi Nakamoto, established a new digital economy where value can be stored and transferred in a fully decentralized manner - alleviating the need for a central authority. This paper introduces a large scale dataset in the form of a transactions graph representing transactions between Bitcoin users along with a set of tasks and baselines. The graph includes 252 million nodes and 785 million edges, covering a time span of nearly 13 years of and 670 million transactions. Each node and edge is timestamped. As for supervised tasks we provide two labeled sets i. a 33,000 nodes based on entity type and ii. nearly 100,000 Bitcoin addresses labeled with an entity name and an entity type. This is the largest publicly available data set of bitcoin transactions designed to facilitate advanced research and exploration in this domain, overcoming the limitations of existing datasets. Various graph neural network models are trained to predict node labels, establishing a baseline for future research. In addition, several use cases are presented to demonstrate the dataset's applicability beyond Bitcoin analysis. Finally, all data and source code is made publicly available to enable reproducibility of the results.

Paper Structure

This paper contains 37 sections, 2 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematic of a transaction $\Delta$. Nodes with a single (resp. double) border represent TXOs (resp. transactions). TXOs consumed by $\Delta$ originate from prior transactions, while those created in $\Delta$ may serve as input TXOs in subsequent transactions.
  • Figure 2: Labeling pipeline.
  • Figure 3: Top: Frequency distribution of the number of messages per thread. Bottom: Frequency distribution of the number of posters per thread.
  • Figure 4: Distribution of categories among the labeled nodes.
  • Figure 5: Evolution of the number of nodes and edges in the graph as a function of the block index.