RECTor: Robust and Efficient Correlation Attack on Tor
Binghui Wu, Dinil Mon Divakaran, Levente Csikor, Mohan Gurusamy
TL;DR
Addressing Tor's vulnerability to traffic correlation under noisy and partial observations, the paper introduces RECTor, a robust and scalable framework that combines attention-based Multiple Instance Learning, GRU-based temporal encoding, a Siamese embedding, and approximate nearest neighbor search. RECTor learns discriminative flow embeddings from windowed segments, enabling near-linear matching complexity and strong resilience to missing data and background noise. Empirical results show RECTor surpasses DeepCorr, DeepCOFFEA, and FlowTracker, achieving up to 60% higher true positive rates under high noise and dramatically reducing training and inference time. The work underscores practical vulnerabilities in Tor and outlines defense directions, including learning-aware countermeasures and protocol-level flow mixing, with source code released to facilitate further research.
Abstract
Tor is a widely used anonymity network that conceals user identities by routing traffic through encrypted relays, yet it remains vulnerable to traffic correlation attacks that deanonymize users by matching patterns in ingress and egress traffic. However, existing correlation methods suffer from two major limitations: limited robustness to noise and partial observations, and poor scalability due to computationally expensive pairwise matching. To address these challenges, we propose RECTor, a machine learning-based framework for traffic correlation under realistic conditions. RECTor employs attention-based Multiple Instance Learning (MIL) and GRU-based temporal encoding to extract robust flow representations, even when traffic data is incomplete or obfuscated. These embeddings are mapped into a shared space via a Siamese network and efficiently matched using approximate nearest neighbor (aNN) search. Empirical evaluations show that RECTor outperforms state-of-the-art baselines such as DeepCorr, DeepCOFFEA, and FlowTracker, achieving up to 60% higher true positive rates under high-noise conditions and reducing training and inference time by over 50%. Moreover, RECTor demonstrates strong scalability: inference cost grows near-linearly as the number of flows increases. These findings reveal critical vulnerabilities in Tor's anonymity model and highlight the need for advanced model-aware defenses.
