HiLoMix: Robust High- and Low-Frequency Graph Learning Framework for Mixing Address Association
Xiaofan Tu, Tiantian Duan, Shuyi Miao, Hanwen Zhang, Yi Sun
TL;DR
HiLoMix addresses the deanonymization challenge of mixing addresses on Ethereum by tackling label noise and scarcity through HAMIG and a frequency-aware contrastive framework. It jointly trains high-pass and low-pass GNNs with confidence-weighted supervision and fuses their predictions via stacking over heterogeneous models. The approach achieves state-of-the-art results on classification and ranking metrics, with notable improvements such as $F_1$ gains of $5.69\%$, $AUC$ gains of $7.34\%$, and $MRR$ gains of $15.61\%$, and it provides a curated ground-truth dataset for future study. This work offers scalable, robust methods for identifying mixing address associations in large blockchain graphs, with practical implications for blockchain privacy and enforcement.
Abstract
As mixing services are increasingly being exploited by malicious actors for illicit transactions, mixing address association has emerged as a critical research task. A range of approaches have been explored, with graph-based models standing out for their ability to capture structural patterns in transaction networks. However, these approaches face two main challenges: label noise and label scarcity, leading to suboptimal performance and limited generalization. To address these, we propose HiLoMix, a graph-based learning framework specifically designed for mixing address association. First, we construct the Heterogeneous Attributed Mixing Interaction Graph (HAMIG) to enrich the topological structure. Second, we introduce frequency-aware graph contrastive learning that captures complementary structural signals from high- and low-frequency graph views. Third, we employ weak supervised learning that assigns confidence-based weighting to noisy labels. Then, we jointly train high-pass and low-pass GNNs using both unsupervised contrastive signals and confidence-based supervision to learn robust node representations. Finally, we adopt a stacking framework to fuse predictions from multiple heterogeneous models, further improving generalization and robustness. Experimental results demonstrate that HiLoMix outperforms existing methods in mixing address association.
