Table of Contents
Fetching ...

Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis

Xinhao Deng, Qi Li, Ke Xu

TL;DR

Holmes addresses the practicality gap in deep-learning website fingerprinting by enabling reliable identification from early-stage Tor traffic. It combines temporal-distribution based adaptive data augmentation with a supervised-contrastive embedding to correlate early-stage and complete traffic in an embedding space, then uses centroid and MAD-based radii to perform interval-wise identification. By rejecting low-confidence identifications and adapting traffic collection, Holmes achieves robust performance under multiple defenses and in real-world dark web traffic, outperforming nine baselines in accuracy, precision, and latency. The work demonstrates a practical, real-time approach to WF with strong robustness and provides insights for defenses and countermeasures in privacy-preserving networks.

Abstract

Website Fingerprinting (WF) attacks identify the websites visited by users by performing traffic analysis, compromising user privacy. Particularly, DL-based WF attacks demonstrate impressive attack performance. However, the effectiveness of DL-based WF attacks relies on the collected complete and pure traffic during the page loading, which impacts the practicality of these attacks. The WF performance is rather low under dynamic network conditions and various WF defenses, particularly when the analyzed traffic is only a small part of the complete traffic. In this paper, we propose Holmes, a robust and reliable early-stage WF attack. Holmes utilizes temporal and spatial distribution analysis of website traffic to effectively identify websites in the early stages of page loading. Specifically, Holmes develops adaptive data augmentation based on the temporal distribution of website traffic and utilizes a supervised contrastive learning method to extract the correlations between the early-stage traffic and the pre-collected complete traffic. Holmes accurately identifies traffic in the early stages of page loading by computing the correlation of the traffic with the spatial distribution information, which ensures robust and reliable detection according to early-stage traffic. We extensively evaluate Holmes using six datasets. Compared to nine existing DL-based WF attacks, Holmes improves the F1-score of identifying early-stage traffic by an average of 169.18%. Furthermore, we replay the traffic of visiting real-world dark web websites. Holmes successfully identifies dark web websites when the ratio of page loading on average is only 21.71%, with an average precision improvement of 169.36% over the existing WF attacks.

Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis

TL;DR

Holmes addresses the practicality gap in deep-learning website fingerprinting by enabling reliable identification from early-stage Tor traffic. It combines temporal-distribution based adaptive data augmentation with a supervised-contrastive embedding to correlate early-stage and complete traffic in an embedding space, then uses centroid and MAD-based radii to perform interval-wise identification. By rejecting low-confidence identifications and adapting traffic collection, Holmes achieves robust performance under multiple defenses and in real-world dark web traffic, outperforming nine baselines in accuracy, precision, and latency. The work demonstrates a practical, real-time approach to WF with strong robustness and provides insights for defenses and countermeasures in privacy-preserving networks.

Abstract

Website Fingerprinting (WF) attacks identify the websites visited by users by performing traffic analysis, compromising user privacy. Particularly, DL-based WF attacks demonstrate impressive attack performance. However, the effectiveness of DL-based WF attacks relies on the collected complete and pure traffic during the page loading, which impacts the practicality of these attacks. The WF performance is rather low under dynamic network conditions and various WF defenses, particularly when the analyzed traffic is only a small part of the complete traffic. In this paper, we propose Holmes, a robust and reliable early-stage WF attack. Holmes utilizes temporal and spatial distribution analysis of website traffic to effectively identify websites in the early stages of page loading. Specifically, Holmes develops adaptive data augmentation based on the temporal distribution of website traffic and utilizes a supervised contrastive learning method to extract the correlations between the early-stage traffic and the pre-collected complete traffic. Holmes accurately identifies traffic in the early stages of page loading by computing the correlation of the traffic with the spatial distribution information, which ensures robust and reliable detection according to early-stage traffic. We extensively evaluate Holmes using six datasets. Compared to nine existing DL-based WF attacks, Holmes improves the F1-score of identifying early-stage traffic by an average of 169.18%. Furthermore, we replay the traffic of visiting real-world dark web websites. Holmes successfully identifies dark web websites when the ratio of page loading on average is only 21.71%, with an average precision improvement of 169.36% over the existing WF attacks.
Paper Structure (24 sections, 8 equations, 13 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 8 equations, 13 figures, 3 tables, 2 algorithms.

Figures (13)

  • Figure 1: Comparison of the early-stage WF attack with existing WF attacks. The early-stage WF attack can identify websites in the early stage of page loading.
  • Figure 2: Distribution of page load times and number of packets for Alexa-top 10k websites.
  • Figure 3: The threat model of Holmes.
  • Figure 4: Visualization of temporal distribution based on feature attribution method SHAP lundberg2017unified.
  • Figure 5: The overview of Holmes.
  • ...and 8 more figures