Table of Contents
Fetching ...

Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach

Régnier Avice, Bernhard Haslhofer, Zhidong Li, Jianlong Zhou

TL;DR

This work tackles the problem of inconsistent cryptoasset attribution tags that can mislead forensics by proposing an end-to-end LLM-based pipeline that links attribution tags to well-defined knowledge graph concepts. The method combines candidate-set generation via filtering and blocking with a neural candidate selector that uses prompting strategies to choose the best knowledge-graph entity or declare no match. Across GraphSense TagPack, WatchYourBack, and DeFi Rekt datasets, the approach yields up to $37.4\%$ improvement in $F1$-score over baselines, with recall up to $93\%$ at $k=5$, and GPT-4o achieving about $94\%$ $F1$, while local models reach around $90\%$; careful cost analysis shows potential reductions up to $90\%$ with minimal performance loss. The work advances data-quality in cryptoasset forensics and provides reproducible code and datasets to enable broader adoption and evaluation.

Abstract

Attribution tags form the foundation of modern cryptoasset forensics. However, inconsistent or incorrect tags can mislead investigations and even result in false accusations. To address this issue, we propose a novel computational method based on Large Language Models (LLMs) to link attribution tags with well-defined knowledge graph concepts. We implemented this method in an end-to-end pipeline and conducted experiments showing that our approach outperforms baseline methods by up to 37.4% in F1-score across three publicly available attribution tag datasets. By integrating concept filtering and blocking procedures, we generate candidate sets containing five knowledge graph entities, achieving a recall of 93% without the need for labeled data. Additionally, we demonstrate that local LLM models can achieve F1-scores of 90%, comparable to remote models which achieve 94%. We also analyze the cost-performance trade-offs of various LLMs and prompt templates, showing that selecting the most cost-effective configuration can reduce costs by 90%, with only a 1% decrease in performance. Our method not only enhances attribution tag quality but also serves as a blueprint for fostering more reliable forensic evidence.

Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach

TL;DR

This work tackles the problem of inconsistent cryptoasset attribution tags that can mislead forensics by proposing an end-to-end LLM-based pipeline that links attribution tags to well-defined knowledge graph concepts. The method combines candidate-set generation via filtering and blocking with a neural candidate selector that uses prompting strategies to choose the best knowledge-graph entity or declare no match. Across GraphSense TagPack, WatchYourBack, and DeFi Rekt datasets, the approach yields up to improvement in -score over baselines, with recall up to at , and GPT-4o achieving about , while local models reach around ; careful cost analysis shows potential reductions up to with minimal performance loss. The work advances data-quality in cryptoasset forensics and provides reproducible code and datasets to enable broader adoption and evaluation.

Abstract

Attribution tags form the foundation of modern cryptoasset forensics. However, inconsistent or incorrect tags can mislead investigations and even result in false accusations. To address this issue, we propose a novel computational method based on Large Language Models (LLMs) to link attribution tags with well-defined knowledge graph concepts. We implemented this method in an end-to-end pipeline and conducted experiments showing that our approach outperforms baseline methods by up to 37.4% in F1-score across three publicly available attribution tag datasets. By integrating concept filtering and blocking procedures, we generate candidate sets containing five knowledge graph entities, achieving a recall of 93% without the need for labeled data. Additionally, we demonstrate that local LLM models can achieve F1-scores of 90%, comparable to remote models which achieve 94%. We also analyze the cost-performance trade-offs of various LLMs and prompt templates, showing that selecting the most cost-effective configuration can reduce costs by 90%, with only a 1% decrease in performance. Our method not only enhances attribution tag quality but also serves as a blueprint for fostering more reliable forensic evidence.

Paper Structure

This paper contains 29 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Attribution Tag Example. Two attribution tags referencing the same cryptoasset address 0x123 owned by the real-world entity BTC-e.
  • Figure 2: Linking an Attribution Tag to the Knowledge Graph. Attribution tag instances are linked to concepts defined in the knowledge graph.
  • Figure 3: Approach Overview. The candidate set generator filters potential entities and the candidate selector module identifies the matching entity.
  • Figure 4: Prompt template. The template used for prompting candidate selection. It consists of several parts (e.g., SYS, FEW-SHOT, INPUT, etc.) and defines the instructions provided to an LLM to guide its response or generated output.
  • Figure 5: Model Performance with different Templates.GPT-4o's zero-shot results vary significantly across different templates, while Llama-3 8B has a more stable performance.
  • ...and 3 more figures