Table of Contents
Fetching ...

Lightening the Load: A Cluster-Based Framework for A Lower-Overhead, Provable Website Fingerprinting Defense

Khashayar Khajavi, Tao Wang

TL;DR

This work presents a unified framework for designing an adaptive WF defense that combines the effectiveness of regularization with the provable security of supersequence-style grouping and instantiates the design as Adaptive Tamaraw, a variant of Tamaraw that assigns padding parameters on a per-cluster basis while retaining its original information-theoretic guarantee.

Abstract

Website fingerprinting (WF) attacks remain a significant threat to encrypted traffic, prompting the development of a wide range of defenses. Among these, two prominent classes are regularization-based defenses, which shape traffic using fixed padding rules, and supersequence-based approaches, which conceal traces among predefined patterns. In this work, we present a unified framework for designing an adaptive WF defense that combines the effectiveness of regularization with the provable security of supersequence-style grouping. The scheme first extracts behavioural patterns from traces and clusters them into (k,l)-diverse anonymity sets; an early-time-series classifier (adapted from ECDIRE) then switches from a conservative global set of regularization parameters to the lighter, set-specific parameters. We instantiate the design as Adaptive Tamaraw, a variant of Tamaraw that assigns padding parameters on a per-cluster basis while retaining its original information-theoretic guarantee. Comprehensive experiments on public real-world datasets confirm the benefits. By tuning k, operators can trade privacy for efficiency: in its high-privacy mode Adaptive Tamaraw pushes the bound on any attacker's accuracy below 30%, whereas in efficiency-centred settings it cuts total overhead by 99% compared with classic Tamaraw.

Lightening the Load: A Cluster-Based Framework for A Lower-Overhead, Provable Website Fingerprinting Defense

TL;DR

This work presents a unified framework for designing an adaptive WF defense that combines the effectiveness of regularization with the provable security of supersequence-style grouping and instantiates the design as Adaptive Tamaraw, a variant of Tamaraw that assigns padding parameters on a per-cluster basis while retaining its original information-theoretic guarantee.

Abstract

Website fingerprinting (WF) attacks remain a significant threat to encrypted traffic, prompting the development of a wide range of defenses. Among these, two prominent classes are regularization-based defenses, which shape traffic using fixed padding rules, and supersequence-based approaches, which conceal traces among predefined patterns. In this work, we present a unified framework for designing an adaptive WF defense that combines the effectiveness of regularization with the provable security of supersequence-style grouping. The scheme first extracts behavioural patterns from traces and clusters them into (k,l)-diverse anonymity sets; an early-time-series classifier (adapted from ECDIRE) then switches from a conservative global set of regularization parameters to the lighter, set-specific parameters. We instantiate the design as Adaptive Tamaraw, a variant of Tamaraw that assigns padding parameters on a per-cluster basis while retaining its original information-theoretic guarantee. Comprehensive experiments on public real-world datasets confirm the benefits. By tuning k, operators can trade privacy for efficiency: in its high-privacy mode Adaptive Tamaraw pushes the bound on any attacker's accuracy below 30%, whereas in efficiency-centred settings it cuts total overhead by 99% compared with classic Tamaraw.

Paper Structure

This paper contains 36 sections, 3 theorems, 29 equations, 10 figures, 10 tables, 1 algorithm.

Key Result

Theorem 5.1

Let $\mathcal{S}$ be the set of anonymity sets constructed in Section sec:AS_gen. For a fixed global regularization parameter pair $(p_{\text{in}}, p_{\text{out}})$, let $\bar{A}(S_i; p_{\text{in}}, p_{\text{out}})$ denote the expected attacker success rate over anonymity set $S_i \in \mathcal{S}$, Then Adaptive Tamaraw is non-uniformly weighted $\delta$-non-injective, and the attacker’s average

Figures (10)

  • Figure 1: Visualization of the TAMs of four traces from the same page, in the dataset obtained by sirinam2018deep. Each TAM divides the first 1000 time-slots (80 ms per slot) into rows for outgoing packets (blue, positive values) and incoming packets (red, negative values); the height of each bar records the packet count in that slot. The first two traces (top row) exhibit a very similar pattern, while the next two traces (bottom row) share a distinct structure. This highlights the possibility of multiple recurring traffic patterns within a single page.
  • Figure 2: Comparison of website-level vs. pattern-level clustering. For each value of $k$, clustering is applied to supermatrices constructed at the website or pattern level, and the resulting average bandwidth overhead is measured. Pattern-level clustering consistently results in lower overhead, especially as $k$ increases, indicating that clustering finer-grained traffic patterns captures homogeneity more effectively than aggregating at the website level.
  • Figure 3: High–level workflow of the first two phases of our defense. 1. Pattern extraction. For each webpage in the training dataset, we group its traces into a small number of stable, recurring traffic patterns (dashed boxes), reflecting variability due to CDNs, localization, and user behavior. 2. Anonymity set construction. The extracted patterns are then clustered across different webpages to form anonymity sets. In this example, each set contains at least $k = 3$ distinct patterns originating from at least $l = 2$ different webpages, thereby satisfying $k$-anonymity and $l$-diversity. A lightweight, cluster-specific regularization schedule is precomputed for each set.
  • Figure 4: Illustration of early anonymity set prediction and parameter switching. An incoming trace is initially regularized using the global parameters $(p_{\text{in}}^g = 0.012, p_{\text{out}}^g = 0.04)$ in this example. At each predefined safe timestamp, the classifier attempts to assign the trace to one of the candidate anonymity sets. In the first attempt, the classifier predicts $S_1$, but no switch occurs because $S_1$ is not valid at that timestamp. At the second safe timestamp, the trace is matched with $S_2$, which is an acceptable set at that point. This triggers a transition to $S_2$'s per-set lighter parameters $(p_{\text{in}}^{S_2} = 0.06, p_{\text{out}}^{S_2} = 0.08)$, which are applied for the rest of the connection. Each anonymity set is tied to a unique safe timestamp, so switching can occur at most once.
  • Figure 5: Distribution of per-trace overhead savings achieved by Adaptive Tamaraw over static Tamaraw for one representative global padding configuration: $(\rho_{\text{in}} = 0.012, \rho_{\text{out}} = 0.04)$, with $k = 7$ and $L = 100$.The red vertical line at 0% indicates the point where both methods incur equal overhead; values to the right indicate savings from adaptation, and those to the left indicate higher cost.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Theorem 5.1: Global Non-Uniformly Weighted $\delta$–Non-Injectivity
  • Lemma E.1: Post-switch non-uniform weighted $\delta$
  • proof
  • Theorem E.2: Global non-uniformly weighted $\delta$–non-injectivity
  • proof