Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach
Régnier Avice, Bernhard Haslhofer, Zhidong Li, Jianlong Zhou
TL;DR
This work tackles the problem of inconsistent cryptoasset attribution tags that can mislead forensics by proposing an end-to-end LLM-based pipeline that links attribution tags to well-defined knowledge graph concepts. The method combines candidate-set generation via filtering and blocking with a neural candidate selector that uses prompting strategies to choose the best knowledge-graph entity or declare no match. Across GraphSense TagPack, WatchYourBack, and DeFi Rekt datasets, the approach yields up to $37.4\%$ improvement in $F1$-score over baselines, with recall up to $93\%$ at $k=5$, and GPT-4o achieving about $94\%$ $F1$, while local models reach around $90\%$; careful cost analysis shows potential reductions up to $90\%$ with minimal performance loss. The work advances data-quality in cryptoasset forensics and provides reproducible code and datasets to enable broader adoption and evaluation.
Abstract
Attribution tags form the foundation of modern cryptoasset forensics. However, inconsistent or incorrect tags can mislead investigations and even result in false accusations. To address this issue, we propose a novel computational method based on Large Language Models (LLMs) to link attribution tags with well-defined knowledge graph concepts. We implemented this method in an end-to-end pipeline and conducted experiments showing that our approach outperforms baseline methods by up to 37.4% in F1-score across three publicly available attribution tag datasets. By integrating concept filtering and blocking procedures, we generate candidate sets containing five knowledge graph entities, achieving a recall of 93% without the need for labeled data. Additionally, we demonstrate that local LLM models can achieve F1-scores of 90%, comparable to remote models which achieve 94%. We also analyze the cost-performance trade-offs of various LLMs and prompt templates, showing that selecting the most cost-effective configuration can reduce costs by 90%, with only a 1% decrease in performance. Our method not only enhances attribution tag quality but also serves as a blueprint for fostering more reliable forensic evidence.
