Reality Check for Tor Website Fingerprinting in the Open World

Mohammadhamed Shadbeh; Khashayar Khajavi; Tao Wang

Reality Check for Tor Website Fingerprinting in the Open World

Mohammadhamed Shadbeh, Khashayar Khajavi, Tao Wang

TL;DR

This work re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces and shows that timing-independent classifiers are significantly more robust to network variability than others.

Abstract

Website fingerprinting (WF) attacks on Tor can infer user destinations from encrypted traffic metadata. However, their real-world effectiveness remains debated due to laboratory settings that fail to capture network fluctuations, evaluate noise, and create a representative open world. In this work, we re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces. Using this methodology, we collect a large-scale dataset of over 800,000 traces. We then benchmark state-of-the-art WF attacks under a cross-network setting and show that WF remains highly effective against real Tor open-world traffic: the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate. We further present results that demonstrate robustness to small training sets, network jitter, and concept drift. Moreover, we show that timing-independent classifiers are significantly more robust to network variability than others. Finally, we provide the first systematic study of Tor's Conflux traffic-splitting, where we show that a guard node with a latency advantage can maintain high attack effectiveness even when traffic is split.

Reality Check for Tor Website Fingerprinting in the Open World

TL;DR

Abstract

Paper Structure (75 sections, 3 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 75 sections, 3 equations, 13 figures, 7 tables, 1 algorithm.

Introduction
Our Contributions
Paper Organization
Background & Related Work
Tor
Website Fingerprinting
Circuits and Guards
Stream Isolation
Conflux
Methodology
Description of Our Methodology
Data Collection
Pre-Conflux Monitored Webpages
Post-Conflux Monitored Webpages
Data Collection at the Client
...and 60 more sections

Figures (13)

Figure 1: Overview of Tor traffic routing architectures. (a) In a standard Tor configuration, traffic flows sequentially through three intermediate nodes: (i) the guard node, which knows the client's IP address but not the destination; (ii) the middle node, which merely forwards the encrypted traffic; and (iii) the exit node, which connects to the destination server but remains oblivious to the client's original IP address. (b) Under Conflux, traffic between the client and the exit node is dynamically split across multiple linked legs. Each Conflux leg routes traffic similar to a standard Tor circuit. Both legs share the same exit node but use different guard and middle nodes.
Figure 2: An attacker can place themselves in different points of entry in a Tor network to train their classifiers for WF attacks. Traditionally, the attacker is assumed to be an eavesdropper who is observing the traffic to/from clients (Attacker A). For this, traces are commonly gathered through the Tor client itself to train the classifiers. Our approach is to gather both training and testing data from a Tor guard node (Attacker B). Cherubin et al. cherubin2022online gather the training data through a Tor exit node (Attacker C).
Figure 3: The $r$-precision ($r=10$) vs. recall curves for different classifiers across three scenarios of Table \ref{['tab:ow-with-guard']}. Dashed lines (where applicable) represent $r$-precision corrections using the Wilson score interval when the number of false positives is below 10.
Figure 4: Performance comparison across different open-world ratios ($\pi_r$). As $r$ increases, the precision collapses, revealing the robustness of different architectures to unmonitored noise. Dashed lines (where applicable) represent $r$-precision corrections using the Wilson score interval when the number of false positives is below 10.
Figure 5: The effect of number of monitored training set size on TPR. We report TPR at a fixed target FPR of $0.5\%$ when training on AU monitored traces and testing on CA monitored traces. Missing points indicate that no threshold achieved FPR $\le 0.5\%$ on the evaluation set.
...and 8 more figures

Reality Check for Tor Website Fingerprinting in the Open World

TL;DR

Abstract

Reality Check for Tor Website Fingerprinting in the Open World

Authors

TL;DR

Abstract

Table of Contents

Figures (13)