Unveiling the Potential: Harnessing Deep Metric Learning to Circumvent Video Streaming Encryption
Arwin Gansekoele, Tycho Bot, Rob van der Mei, Sandjai Bhulai, Mark Hoogendoorn
TL;DR
This work addresses the risk of side-channel information leakage from encrypted video streams under HTTPS by identifying videos without decrypting content. It introduces a deep metric learning framework based on triplet loss, augmented with an Outlier Leveraging extension, to achieve robust open-set detection that generalizes to unseen videos and scales to large video collections. Classification relies on embedding centroids per video and a centroid-based softmax, enabling quick inclusion of new videos without retraining. Empirical results show strong performance on small datasets, competitive results on larger ones, and partial transferability across browser settings, underscoring privacy risks and motivating defenses in streaming and encryption protocols.
Abstract
Encryption on the internet with the shift to HTTPS has been an important step to improve the privacy of internet users. However, there is an increasing body of work about extracting information from encrypted internet traffic without having to decrypt it. Such attacks bypass security guarantees assumed to be given by HTTPS and thus need to be understood. Prior works showed that the variable bitrates of video streams are sufficient to identify which video someone is watching. These works generally have to make trade-offs in aspects such as accuracy, scalability, robustness, etc. These trade-offs complicate the practical use of these attacks. To that end, we propose a deep metric learning framework based on the triplet loss method. Through this framework, we achieve robust, generalisable, scalable and transferable encrypted video stream detection. First, the triplet loss is better able to deal with video streams not seen during training. Second, our approach can accurately classify videos not seen during training. Third, we show that our method scales well to a dataset of over 1000 videos. Finally, we show that a model trained on video streams over Chrome can also classify streams over Firefox. Our results suggest that this side-channel attack is more broadly applicable than originally thought. We provide our code alongside a diverse and up-to-date dataset for future research.
