Table of Contents
Fetching ...

STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting

Yifei Cheng, Yujia Zhu, Baiyang Li, Xinhao Deng, Yitong Cai, Yaochen Ren, Qingyun Liu

TL;DR

STAR reframes website fingerprinting under HTTPS as a zero-shot cross-modal retrieval problem by learning a shared embedding between encrypted traffic and crawl-time website logic. Using a dual-encoder architecture trained with contrastive and auxiliary losses, STAR enables zero-shot and few-shot recognition without target-site traffic, achieving 87.9% top-1 accuracy on 1,600 unseen sites and 0.963 AUC in open-world tests. The approach reveals intrinsic semantic leakage as a privacy risk in encrypted web traffic and demonstrates robust generalization through structure-aware augmentation and cross-modal alignment anchors. The work provides extensive datasets and code to support reproducibility and future research in semantic inference under encrypted protocols.

Abstract

Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from encrypted traffic patterns. Existing WF methods rely on supervised learning with site-specific labeled traces, which limits scalability and fails to handle previously unseen websites. We address these limitations by reformulating WF as a zero-shot cross-modal retrieval problem and introducing STAR. STAR learns a joint embedding space for encrypted traffic traces and crawl-time logic profiles using a dual-encoder architecture. Trained on 150K automatically collected traffic-logic pairs with contrastive and consistency objectives and structure-aware augmentation, STAR retrieves the most semantically aligned profile for a trace without requiring target-side traffic during training. Experiments on 1,600 unseen websites show that STAR achieves 87.9 percent top-1 accuracy and 0.963 AUC in open-world detection, outperforming supervised and few-shot baselines. Adding an adapter with only four labeled traces per site further boosts top-5 accuracy to 98.8 percent. Our analysis reveals intrinsic semantic-traffic alignment in modern web protocols, identifying semantic leakage as the dominant privacy risk in encrypted HTTPS traffic. We release STAR's datasets and code to support reproducibility and future research.

STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting

TL;DR

STAR reframes website fingerprinting under HTTPS as a zero-shot cross-modal retrieval problem by learning a shared embedding between encrypted traffic and crawl-time website logic. Using a dual-encoder architecture trained with contrastive and auxiliary losses, STAR enables zero-shot and few-shot recognition without target-site traffic, achieving 87.9% top-1 accuracy on 1,600 unseen sites and 0.963 AUC in open-world tests. The approach reveals intrinsic semantic leakage as a privacy risk in encrypted web traffic and demonstrates robust generalization through structure-aware augmentation and cross-modal alignment anchors. The work provides extensive datasets and code to support reproducibility and future research in semantic inference under encrypted protocols.

Abstract

Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from encrypted traffic patterns. Existing WF methods rely on supervised learning with site-specific labeled traces, which limits scalability and fails to handle previously unseen websites. We address these limitations by reformulating WF as a zero-shot cross-modal retrieval problem and introducing STAR. STAR learns a joint embedding space for encrypted traffic traces and crawl-time logic profiles using a dual-encoder architecture. Trained on 150K automatically collected traffic-logic pairs with contrastive and consistency objectives and structure-aware augmentation, STAR retrieves the most semantically aligned profile for a trace without requiring target-side traffic during training. Experiments on 1,600 unseen websites show that STAR achieves 87.9 percent top-1 accuracy and 0.963 AUC in open-world detection, outperforming supervised and few-shot baselines. Adding an adapter with only four labeled traces per site further boosts top-5 accuracy to 98.8 percent. Our analysis reveals intrinsic semantic-traffic alignment in modern web protocols, identifying semantic leakage as the dominant privacy risk in encrypted HTTPS traffic. We release STAR's datasets and code to support reproducibility and future research.

Paper Structure

This paper contains 36 sections, 9 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of cross-modality alignment anchors in our setting. (A–C) illustrate three hypothesized alignment anchors between website semantic logic (left) and encrypted traffic behavior (right): request-side, response-side, and transport protocol. (D–F) present empirical support for each anchor via Pearson correlation or Wasserstein distance on representative samples.
  • Figure 2: Training‑stage framework of STAR. Structure‑aware logic–traffic sample pairs and labeled traffic samples are passed through the Logic Encoder and Traffic Encoder, whose weights are jointly learned so that (1) paired embeddings align via contrastive loss, (2) traffic embeddings support supervised classification, and (3) same‑class traffic embeddings remain consistent.
  • Figure 3: Inference-stage framework of STAR. (a) Zero-Shot Retrieval: encode a test trace and match it against gallery logic embeddings; assign the top class if the similarity exceeds a threshold, otherwise reject as “Unknown.” (b) Few-Shot Linear Probe: train a linear classifier on few-shot traffic embeddings with the encoder frozen. (c) Few-Shot Tip-Adapter: fuse anchor-based logits from logic retrieval with k-NN logits from a few-shot traffic memory for final prediction.
  • Figure 4: Closed-world and open-world performance comparison.(a) Top-1 accuracy under different n-shot settings in the closed-world scenario. Zero-shot STAR is marked with a purple star, and the top-3 few-shot baselines are color-highlighted; others are shown in grey. (b) Precision-recall curves in the open-world 4-shot setting, comparing Zero-shot STAR with the top-3 baselines. AUC and best F1 scores are shown in the table.
  • Figure 5: Analysis of the STAR model. (a, b) t-SNE visualization and cosine similarity statistics of modality representations learned by TF (baseline) and STAR, respectively. (c) Gradient-based importance scores over input positions reveal that both modalities exhibit localized discriminative patterns. (d) Impact of training data scale on zero-shot classification accuracy, showing rapid performance saturation after 100k samples.