Table of Contents
Fetching ...

A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges

Yuwen Cui, Guangjing Wang, Khanh Vu, Kai Wei, Kehan Shen, Zhengyuan Jiang, Xiao Han, Ning Wang, Zhuo Lu, Yao Liu

TL;DR

This survey addresses the WF landscape in Tor by unifying datasets, attack methodologies, and defense strategies. It synthesizes traditional ML and deep learning WF attacks, evaluates them across closed-world and open-world settings, and surveys defenses spanning adaptive padding, regularization, morphing, and adversarial perturbations. The work identifies core challenges—data drift, multi-tab browsing, dataset realism, and deployment practicality—and outlines future directions, including data diversity, adaptive modeling, and the emerging role of LLMs in WF. By consolidating prior work and proposing a structured framework, the paper guides researchers and practitioners toward more robust and deployable privacy protections for Tor users.

Abstract

The Tor network provides users with strong anonymity by routing their internet traffic through multiple relays. While Tor encrypts traffic and hides IP addresses, it remains vulnerable to traffic analysis attacks such as the website fingerprinting (WF) attack, achieving increasingly high fingerprinting accuracy even under open-world conditions. In response, researchers have proposed a variety of defenses, ranging from adaptive padding, traffic regularization, and traffic morphing to adversarial perturbation, that seek to obfuscate or reshape traffic traces. However, these defenses often entail trade-offs between privacy, usability, and system performance. Despite extensive research, a comprehensive survey unifying WF datasets, attack methodologies, and defense strategies remains absent. This paper fills that gap by systematically categorizing existing WF research into three key domains: datasets, attack models, and defense mechanisms. We provide an in-depth comparative analysis of techniques, highlight their strengths and limitations under diverse threat models, and discuss emerging challenges such as multi-tab browsing and coarse-grained traffic features. By consolidating prior work and identifying open research directions, this survey serves as a foundation for advancing stronger privacy protection in Tor.

A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges

TL;DR

This survey addresses the WF landscape in Tor by unifying datasets, attack methodologies, and defense strategies. It synthesizes traditional ML and deep learning WF attacks, evaluates them across closed-world and open-world settings, and surveys defenses spanning adaptive padding, regularization, morphing, and adversarial perturbations. The work identifies core challenges—data drift, multi-tab browsing, dataset realism, and deployment practicality—and outlines future directions, including data diversity, adaptive modeling, and the emerging role of LLMs in WF. By consolidating prior work and proposing a structured framework, the paper guides researchers and practitioners toward more robust and deployable privacy protections for Tor users.

Abstract

The Tor network provides users with strong anonymity by routing their internet traffic through multiple relays. While Tor encrypts traffic and hides IP addresses, it remains vulnerable to traffic analysis attacks such as the website fingerprinting (WF) attack, achieving increasingly high fingerprinting accuracy even under open-world conditions. In response, researchers have proposed a variety of defenses, ranging from adaptive padding, traffic regularization, and traffic morphing to adversarial perturbation, that seek to obfuscate or reshape traffic traces. However, these defenses often entail trade-offs between privacy, usability, and system performance. Despite extensive research, a comprehensive survey unifying WF datasets, attack methodologies, and defense strategies remains absent. This paper fills that gap by systematically categorizing existing WF research into three key domains: datasets, attack models, and defense mechanisms. We provide an in-depth comparative analysis of techniques, highlight their strengths and limitations under diverse threat models, and discuss emerging challenges such as multi-tab browsing and coarse-grained traffic features. By consolidating prior work and identifying open research directions, this survey serves as a foundation for advancing stronger privacy protection in Tor.

Paper Structure

This paper contains 59 sections, 13 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Structure of the survey. Website fingerprinting security can be categorized into three major categories based on the roles of existing works: dataset, WF attacks, and WF defenses. For each major category, there are threat models associated with it.
  • Figure 2: Overview of the WF attack and defense location in Tor architecture.
  • Figure 3: Data representation between different layers of data transport.
  • Figure 4: Traffic timing trace with direction, bursts, and ordering.
  • Figure 5: An overview of the two primarily utilized forms of traffic data in WF studies.
  • ...and 5 more figures