Table of Contents
Fetching ...

BlockScan: Detecting Anomalies in Blockchain Transactions

Jiahao Yu, Xian Wu, Hao Liu, Wenbo Guo, Xinyu Xing

TL;DR

BlockScan tackles anomaly detection in DeFi transactions by engineering a multi-modal tokenizer and a one-stage BERT-style MLM foundation that can manage long blockchain sequences with RoPE and FlashAttention. Anomalies are detected via reconstruction errors on masked tokens, backed by a theoretical bound that links detection efficacy to distributional divergence between benign and malicious transactions. Empirical results on Ethereum and Solana show BlockScan surpassing diverse baselines with lower false positives and higher recall, and ablation studies confirm the importance of tokenizer design, log encoding, and efficient attention. The work also contributes open-source code and datasets, establishing a practical, theoretically grounded benchmark for Transformer-based blockchain analysis.

Abstract

We propose BlockScan, a customized Transformer for anomaly detection in blockchain transactions. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models (LLMs), BlockScan introduces a series of customized designs to effectively model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a novel modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized masked language modeling mechanism for pretraining the Transformer architecture, incorporating RoPE embedding and FlashAttention for handling longer sequences. Finally, we design a novel anomaly detection method based on the model outputs. We further provide theoretical analysis for the detection method of our system. Extensive evaluations on Ethereum and Solana transactions demonstrate BlockScan's exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, BlockScan is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work sets a new benchmark for applying Transformer-based approaches in blockchain data analysis.

BlockScan: Detecting Anomalies in Blockchain Transactions

TL;DR

BlockScan tackles anomaly detection in DeFi transactions by engineering a multi-modal tokenizer and a one-stage BERT-style MLM foundation that can manage long blockchain sequences with RoPE and FlashAttention. Anomalies are detected via reconstruction errors on masked tokens, backed by a theoretical bound that links detection efficacy to distributional divergence between benign and malicious transactions. Empirical results on Ethereum and Solana show BlockScan surpassing diverse baselines with lower false positives and higher recall, and ablation studies confirm the importance of tokenizer design, log encoding, and efficient attention. The work also contributes open-source code and datasets, establishing a practical, theoretically grounded benchmark for Transformer-based blockchain analysis.

Abstract

We propose BlockScan, a customized Transformer for anomaly detection in blockchain transactions. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models (LLMs), BlockScan introduces a series of customized designs to effectively model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a novel modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized masked language modeling mechanism for pretraining the Transformer architecture, incorporating RoPE embedding and FlashAttention for handling longer sequences. Finally, we design a novel anomaly detection method based on the model outputs. We further provide theoretical analysis for the detection method of our system. Extensive evaluations on Ethereum and Solana transactions demonstrate BlockScan's exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, BlockScan is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work sets a new benchmark for applying Transformer-based approaches in blockchain data analysis.
Paper Structure (30 sections, 1 theorem, 4 equations, 2 figures, 10 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 4 equations, 2 figures, 10 tables, 1 algorithm.

Key Result

Theorem 5.1

The expected loss of $h^*$ on the malicious distribution $P_{\text{mal}}$ satisfies: where the first R.H.S. term is the benign loss, $d_{\mathcal{H}\Delta\mathcal{H}}$ is the $\mathcal{H}\Delta\mathcal{H}$-divergence between $P_{\text{benign}}$ and $P_{\text{mal}}$, and $\lambda$ is a constant representing the minimum joint error of any hypothesis in $\mathcal{H}$.

Figures (2)

  • Figure 1: Tokenizer of . tokenizes transactions by flattening nested JSON using depth-first search, assigns unique tokens to frequent addresses while marking infrequent ones as "OOV", and uses special tokens ("[START]", "[END]", "[Ins]") to mark function artifacts.
  • Figure 2: Performance comparison of BlockGPT and on the larger dataset.

Theorems & Definitions (1)

  • Theorem 5.1: ben2010theory