Table of Contents
Fetching ...

Unveiling the Potential: Harnessing Deep Metric Learning to Circumvent Video Streaming Encryption

Arwin Gansekoele, Tycho Bot, Rob van der Mei, Sandjai Bhulai, Mark Hoogendoorn

TL;DR

This work addresses the risk of side-channel information leakage from encrypted video streams under HTTPS by identifying videos without decrypting content. It introduces a deep metric learning framework based on triplet loss, augmented with an Outlier Leveraging extension, to achieve robust open-set detection that generalizes to unseen videos and scales to large video collections. Classification relies on embedding centroids per video and a centroid-based softmax, enabling quick inclusion of new videos without retraining. Empirical results show strong performance on small datasets, competitive results on larger ones, and partial transferability across browser settings, underscoring privacy risks and motivating defenses in streaming and encryption protocols.

Abstract

Encryption on the internet with the shift to HTTPS has been an important step to improve the privacy of internet users. However, there is an increasing body of work about extracting information from encrypted internet traffic without having to decrypt it. Such attacks bypass security guarantees assumed to be given by HTTPS and thus need to be understood. Prior works showed that the variable bitrates of video streams are sufficient to identify which video someone is watching. These works generally have to make trade-offs in aspects such as accuracy, scalability, robustness, etc. These trade-offs complicate the practical use of these attacks. To that end, we propose a deep metric learning framework based on the triplet loss method. Through this framework, we achieve robust, generalisable, scalable and transferable encrypted video stream detection. First, the triplet loss is better able to deal with video streams not seen during training. Second, our approach can accurately classify videos not seen during training. Third, we show that our method scales well to a dataset of over 1000 videos. Finally, we show that a model trained on video streams over Chrome can also classify streams over Firefox. Our results suggest that this side-channel attack is more broadly applicable than originally thought. We provide our code alongside a diverse and up-to-date dataset for future research.

Unveiling the Potential: Harnessing Deep Metric Learning to Circumvent Video Streaming Encryption

TL;DR

This work addresses the risk of side-channel information leakage from encrypted video streams under HTTPS by identifying videos without decrypting content. It introduces a deep metric learning framework based on triplet loss, augmented with an Outlier Leveraging extension, to achieve robust open-set detection that generalizes to unseen videos and scales to large video collections. Classification relies on embedding centroids per video and a centroid-based softmax, enabling quick inclusion of new videos without retraining. Empirical results show strong performance on small datasets, competitive results on larger ones, and partial transferability across browser settings, underscoring privacy risks and motivating defenses in streaming and encryption protocols.

Abstract

Encryption on the internet with the shift to HTTPS has been an important step to improve the privacy of internet users. However, there is an increasing body of work about extracting information from encrypted internet traffic without having to decrypt it. Such attacks bypass security guarantees assumed to be given by HTTPS and thus need to be understood. Prior works showed that the variable bitrates of video streams are sufficient to identify which video someone is watching. These works generally have to make trade-offs in aspects such as accuracy, scalability, robustness, etc. These trade-offs complicate the practical use of these attacks. To that end, we propose a deep metric learning framework based on the triplet loss method. Through this framework, we achieve robust, generalisable, scalable and transferable encrypted video stream detection. First, the triplet loss is better able to deal with video streams not seen during training. Second, our approach can accurately classify videos not seen during training. Third, we show that our method scales well to a dataset of over 1000 videos. Finally, we show that a model trained on video streams over Chrome can also classify streams over Firefox. Our results suggest that this side-channel attack is more broadly applicable than originally thought. We provide our code alongside a diverse and up-to-date dataset for future research.
Paper Structure (21 sections, 7 equations, 3 figures, 4 tables)

This paper contains 21 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The accuracy for the models trained using 18 of the videos (leave out a fraction of 0.1) but where we include a variable number of shots. It seems that generating a centroid using more data has a positive effect on the accuracy.
  • Figure 2: The accuracy for the models trained using 978 of the videos (leave out a fraction of 0.1) but where we include a variable number of shots. Using more shots increases accuracy quite drastically, achieving over $80\%$ accuracy when adding all $8$ available shots. This is despite that we do not touch the model parameters when adding these videos.
  • Figure 3: The accuracy for different models when varying the numbers of videos left out. A fraction of classes left out of $0.2$, for example, means that the model is trained on $870$ out of the original $1087$ videos, and the remaining $217$ are added during evaluation. Note that we do not retrain on these videos; we only add centroids for them and evaluate our models.