Table of Contents
Fetching ...

TSA-WF: Exploring the Effectiveness of Time Series Analysis for Website Fingerprinting

Michael Wrana, Uzma Maroof, Diogo Barradas

TL;DR

The paper reframes website fingerprinting as a time-series matching problem and introduces TSA-WF, a pipeline that preserves packet timing and direction to enable classical time-series similarity measures for WF. It details prototype selection, multi-distance distance computation, a prediction model, and a method to untangle multi-tab traces to approximate where a monitored site was visited. On Tor traces, TSA-WF achieves 91.2% accuracy in single-tab open-world, undefended traces and can locate a monitored website within about 10k packets in 3-tab traces with 83.7% success, though it trails DL-based attacks in multi-tab contexts. Overall, the work demonstrates the viability and complementarity of time-series approaches for WF and suggests directions to integrate with deep learning for robust multi-tab analysis.

Abstract

Website fingerprinting (WF) is a technique that allows an eavesdropper to determine the website a target user is accessing by inspecting the metadata associated with the packets she exchanges via some encrypted tunnel, e.g., Tor. Recent WF attacks built using machine learning (and deep learning) process and summarize trace metadata during their feature extraction phases. This methodology leads to predictions that lack information about the instant at which a given website is detected within a (potentially large) network trace comprised of multiple sequential website accesses -- a setting known as \textit{multi-tab} WF. In this paper, we explore whether classical time series analysis techniques can be effective in the WF setting. Specifically, we introduce TSA-WF, a pipeline designed to closely preserve network traces' timing and direction characteristics, which enables the exploration of algorithms designed to measure time series similarity in the WF context. Our evaluation with Tor traces reveals that TSA-WF achieves a comparable accuracy to existing WF attacks in scenarios where website accesses can be easily singled-out from a given trace (i.e., the \textit{single-tab} WF setting), even when shielded by specially designed WF defenses. Finally, while TSA-WF did not outperform existing attacks in the multi-tab setting, we show how TSA-WF can help pinpoint the approximate instant at which a given website of interest is visited within a multi-tab trace.\footnote{This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in the Proceedings of the 20th International Conference on Availability, Reliability and Security (ARES 2025)}

TSA-WF: Exploring the Effectiveness of Time Series Analysis for Website Fingerprinting

TL;DR

The paper reframes website fingerprinting as a time-series matching problem and introduces TSA-WF, a pipeline that preserves packet timing and direction to enable classical time-series similarity measures for WF. It details prototype selection, multi-distance distance computation, a prediction model, and a method to untangle multi-tab traces to approximate where a monitored site was visited. On Tor traces, TSA-WF achieves 91.2% accuracy in single-tab open-world, undefended traces and can locate a monitored website within about 10k packets in 3-tab traces with 83.7% success, though it trails DL-based attacks in multi-tab contexts. Overall, the work demonstrates the viability and complementarity of time-series approaches for WF and suggests directions to integrate with deep learning for robust multi-tab analysis.

Abstract

Website fingerprinting (WF) is a technique that allows an eavesdropper to determine the website a target user is accessing by inspecting the metadata associated with the packets she exchanges via some encrypted tunnel, e.g., Tor. Recent WF attacks built using machine learning (and deep learning) process and summarize trace metadata during their feature extraction phases. This methodology leads to predictions that lack information about the instant at which a given website is detected within a (potentially large) network trace comprised of multiple sequential website accesses -- a setting known as \textit{multi-tab} WF. In this paper, we explore whether classical time series analysis techniques can be effective in the WF setting. Specifically, we introduce TSA-WF, a pipeline designed to closely preserve network traces' timing and direction characteristics, which enables the exploration of algorithms designed to measure time series similarity in the WF context. Our evaluation with Tor traces reveals that TSA-WF achieves a comparable accuracy to existing WF attacks in scenarios where website accesses can be easily singled-out from a given trace (i.e., the \textit{single-tab} WF setting), even when shielded by specially designed WF defenses. Finally, while TSA-WF did not outperform existing attacks in the multi-tab setting, we show how TSA-WF can help pinpoint the approximate instant at which a given website of interest is visited within a multi-tab trace.\footnote{This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in the Proceedings of the 20th International Conference on Availability, Reliability and Security (ARES 2025)}

Paper Structure

This paper contains 23 sections, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: A standard website fingerprinting threat model over Tor. A local adversary eavesdrops Alice's encrypted communications while she accesses a set of websites.
  • Figure 2: Trace with and without separation of incoming and outgoing packets.
  • Figure 3: A time-series representation of two website traces from the same class.
  • Figure 4: Merged and separated representations of monitored and unmonitored traces.
  • Figure 5: Best match locations for different techniques.
  • ...and 2 more figures