Table of Contents
Fetching ...

Assessing the Efficacy of Heuristic-Based Address Clustering for Bitcoin

Hugo Schnoering, Pierre Porthaux, Michalis Vazirgiannis

TL;DR

Bitcoin transaction analysis suffers from the explosion in addresses, motivating clustering to form entity-level groups. The authors evaluate six heuristics (two established and four novel) and introduce the clustering ratio $r_k^h = |C_k^h| / |S_k|$ to quantify how much a heuristic reduces the address space, with temporal analysis up to block index 700000. They find that common-input-ownership type heuristics reduce roughly by half, while changes-based heuristics and four novel variants offer additional 5–15% reductions; a combined heuristic (CIO plus others) achieves about a 70% reduction to roughly 250 million clusters from 874 million addresses. The study provides practical guidance on selecting and combining heuristics for scalable blockchain analytics and highlights how evolution over time affects clustering power.

Abstract

Exploring transactions within the Bitcoin blockchain entails examining the transfer of bitcoins among several hundred million entities. However, it is often impractical and resource-consuming to study such a vast number of entities. Consequently, entity clustering serves as an initial step in most analytical studies. This process often employs heuristics grounded in the practices and behaviors of these entities. In this research, we delve into the examination of two widely used heuristics, alongside the introduction of four novel ones. Our contribution includes the introduction of the \textit{clustering ratio}, a metric designed to quantify the reduction in the number of entities achieved by a given heuristic. The assessment of this reduction ratio plays an important role in justifying the selection of a specific heuristic for analytical purposes. Given the dynamic nature of the Bitcoin system, characterized by a continuous increase in the number of entities on the blockchain, and the evolving behaviors of these entities, we extend our study to explore the temporal evolution of the clustering ratio for each heuristic. This temporal analysis enhances our understanding of the effectiveness of these heuristics over time.

Assessing the Efficacy of Heuristic-Based Address Clustering for Bitcoin

TL;DR

Bitcoin transaction analysis suffers from the explosion in addresses, motivating clustering to form entity-level groups. The authors evaluate six heuristics (two established and four novel) and introduce the clustering ratio to quantify how much a heuristic reduces the address space, with temporal analysis up to block index 700000. They find that common-input-ownership type heuristics reduce roughly by half, while changes-based heuristics and four novel variants offer additional 5–15% reductions; a combined heuristic (CIO plus others) achieves about a 70% reduction to roughly 250 million clusters from 874 million addresses. The study provides practical guidance on selecting and combining heuristics for scalable blockchain analytics and highlights how evolution over time affects clustering power.

Abstract

Exploring transactions within the Bitcoin blockchain entails examining the transfer of bitcoins among several hundred million entities. However, it is often impractical and resource-consuming to study such a vast number of entities. Consequently, entity clustering serves as an initial step in most analytical studies. This process often employs heuristics grounded in the practices and behaviors of these entities. In this research, we delve into the examination of two widely used heuristics, alongside the introduction of four novel ones. Our contribution includes the introduction of the \textit{clustering ratio}, a metric designed to quantify the reduction in the number of entities achieved by a given heuristic. The assessment of this reduction ratio plays an important role in justifying the selection of a specific heuristic for analytical purposes. Given the dynamic nature of the Bitcoin system, characterized by a continuous increase in the number of entities on the blockchain, and the evolving behaviors of these entities, we extend our study to explore the temporal evolution of the clustering ratio for each heuristic. This temporal analysis enhances our understanding of the effectiveness of these heuristics over time.
Paper Structure (22 sections, 2 equations, 9 figures, 1 table)

This paper contains 22 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Schematic of a transaction $\Delta$. Nodes with a single (resp. double) border symbolize TXOs (resp. transactions). TXOs consumed by $\Delta$ originate from prior transactions, while those created in $\Delta$ may serve as input TXOs in subsequent transactions.
  • Figure 2: Schematic of a payment transaction $\Delta$.
  • Figure 3: Schematic of a payment transaction $\Delta$ with change. Gray (resp. white) TXOs belong to user $u$ (resp. $u^\prime$).
  • Figure 4: Schematic of a script chain. Full edges represent the transfer of change. Dotted edges represent payments.
  • Figure 5: Evolution of the rounding exponent $i$ for $x=1$ dollar w.r.t. the block index.
  • ...and 4 more figures