SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang, Song Fang
TL;DR
SilhouetteTell addresses the privacy threat of identifying videos a person watches by exploiting the spatiotemporal patterns of subtitle silhouettes captured from blurry screen recordings. It introduces a two-phase approach—training a Mask R-CNN to extract subtitle silhouettes and then inferring the watched video via a robust demodulation of silhouette sequences against a subtitle library—achieving high top-k accuracy even at long recording distances. The work provides extensive evaluations across 300 videos and multiple devices, demonstrating robustness to distance, angle, playback speed, and partial occlusions, and shows that traditional OCR/CRNN/CLIP methods fail under blur. It also discusses defenses such as subtitle cancellation, silhouette obfuscation, and privacy screens, highlighting practical privacy implications and guiding future extensions to other on-screen text-based fingerprints. The findings emphasize that non-invasive, traffic-independent side-channel attacks can reveal sensitive viewing histories, motivating both broader threat awareness and the development of concrete mitigations.
Abstract
Video identification attacks pose a significant privacy threat that can reveal videos that victims watch, which may disclose their hobbies, religious beliefs, political leanings, sexual orientation, and health status. Also, video watching history can be used for user profiling or advertising and may result in cyberbullying, discrimination, or blackmail. Existing extensive video inference techniques usually depend on analyzing network traffic generated by streaming online videos. In this work, we observe that the content of a subtitle determines its silhouette displayed on the screen, and identifying each subtitle silhouette also derives the temporal difference between two consecutive subtitles. We then propose SilhouetteTell, a novel video identification attack that combines the spatial and time domain information into a spatiotemporal feature of subtitle silhouettes. SilhouetteTell explores the spatiotemporal correlation between recorded subtitle silhouettes of a video and its subtitle file. It can infer both online and offline videos. Comprehensive experiments on off-the-shelf smartphones confirm the high efficacy of SilhouetteTell for inferring video titles and clips under various settings, including from a distance of up to 40 meters.
