Table of Contents
Fetching ...

Heuristic-Based Address Clustering in Cardano Blockchain

Mostafa Chegenizadeh, Sina Rafati Niya, Claudio J. Tessone

TL;DR

This paper addresses privacy-aware analytics on Cardano by tailoring address clustering to Cardano's Extended UTXO (EUTXO) model. It introduces two Cardano-specific heuristics—one based on multi-input patterns and another on staking/delegation relationships—implemented with UnionFind to cluster addresses into entity-level groups. Applying the approach to Cardano data from 2017–2023 via DBSync and PySpark, the authors demonstrate that medium-sized entities average about 9.67 addresses, and the overall size distribution follows a power-law $p(x) \propto x^{-\alpha}$, with improved fit when combining heuristics. The work enables wealth distribution and asset-velocity analyses in Cardano, though it notes the presence of superclusters likely due to false positives and calls for future methods to detect and prune such links.

Abstract

Blockchain technology has recently gained widespread popularity as a practical method of storing immutable data while preserving the privacy of users by anonymizing their real identities. This anonymization approach, however, significantly complicates the analysis of blockchain data. To address this problem, heuristic-based clustering algorithms as an effective way of linking all addresses controlled by the same entity have been presented in the literature. In this paper, considering the particular features of the Extended Unspent Transaction Outputs accounting model introduced by the Cardano blockchain, two new clustering heuristics are proposed for clustering the Cardano payment addresses. Applying these heuristics and employing the UnionFind algorithm, we efficiently cluster all the addresses that have appeared on the Cardano blockchain from September 2017 to January 2023, where each cluster represents a distinct entity. The results show that each medium-sized entity in the Cardano network owns and controls 9.67 payment addresses on average. The results also confirm that a power law distribution is fitted to the distribution of entity sizes recognized using our proposed heuristics.

Heuristic-Based Address Clustering in Cardano Blockchain

TL;DR

This paper addresses privacy-aware analytics on Cardano by tailoring address clustering to Cardano's Extended UTXO (EUTXO) model. It introduces two Cardano-specific heuristics—one based on multi-input patterns and another on staking/delegation relationships—implemented with UnionFind to cluster addresses into entity-level groups. Applying the approach to Cardano data from 2017–2023 via DBSync and PySpark, the authors demonstrate that medium-sized entities average about 9.67 addresses, and the overall size distribution follows a power-law , with improved fit when combining heuristics. The work enables wealth distribution and asset-velocity analyses in Cardano, though it notes the presence of superclusters likely due to false positives and calls for future methods to detect and prune such links.

Abstract

Blockchain technology has recently gained widespread popularity as a practical method of storing immutable data while preserving the privacy of users by anonymizing their real identities. This anonymization approach, however, significantly complicates the analysis of blockchain data. To address this problem, heuristic-based clustering algorithms as an effective way of linking all addresses controlled by the same entity have been presented in the literature. In this paper, considering the particular features of the Extended Unspent Transaction Outputs accounting model introduced by the Cardano blockchain, two new clustering heuristics are proposed for clustering the Cardano payment addresses. Applying these heuristics and employing the UnionFind algorithm, we efficiently cluster all the addresses that have appeared on the Cardano blockchain from September 2017 to January 2023, where each cluster represents a distinct entity. The results show that each medium-sized entity in the Cardano network owns and controls 9.67 payment addresses on average. The results also confirm that a power law distribution is fitted to the distribution of entity sizes recognized using our proposed heuristics.

Paper Structure

This paper contains 12 sections, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Example scenario 1: value transfer among different addresses of the same entity
  • Figure 2: Example scenario 2: Value transfer from different addresses of the same entity
  • Figure 3: Example scenario 2: Value transfer from different addresses of different entities
  • Figure 4: Number of new addresses appeared on the blockchain per day
  • Figure 5: Distribution of addresses per entities (clusters) determined by Heuristic 1
  • ...and 6 more figures