ScreenMark: Watermarking Arbitrary Visual Content on Screen
Xiujian Liang, Gaozhi Liu, Yichao Si, Xiaoxiao Hu, Zhenxing Qian
TL;DR
This work addresses the challenge of protecting arbitrary Visual Screen Content (VSC) from leakage through screenshots, a scenario inadequately handled by traditional single-modality watermarking. ScreenMark introduces a diffusion-inspired, three-stage progressive framework that converts regular watermark information into irregular patterns and fuses them with screen content via an alpha-blending renderer, followed by adaptive pre-training and enhancement fine-tuning. Key contributions include the three-stage training strategy, a newly collected 100k-screenshot dataset, and extensive experiments showing strong robustness, invisibility, and real-world applicability, with optimization losses organized as $L_{stage1}$, $L_{stage2}$, and $L_{stage3}$. The approach achieves real-time protection across diverse VSC modalities, matching or surpassing state-of-the-art single-modal baselines in screenshot scenarios and enabling practical deployment for secure screen content protection.
Abstract
Digital watermarking has shown its effectiveness in protecting multimedia content. However, existing watermarking is predominantly tailored for specific media types, rendering them less effective for the protection of content displayed on computer screens, which is often multi-modal and dynamic. Visual Screen Content (VSC), is particularly susceptible to theft and leakage through screenshots, a vulnerability that current watermarking methods fail to adequately address.To address these challenges, we propose ScreenMark, a robust and practical watermarking method designed specifically for arbitrary VSC protection. ScreenMark utilizes a three-stage progressive watermarking framework. Initially, inspired by diffusion principles, we initialize the mutual transformation between regular watermark information and irregular watermark patterns. Subsequently, these patterns are integrated with screen content using a pre-multiplication alpha blending technique, supported by a pre-trained screen decoder for accurate watermark retrieval. The progressively complex distorter enhances the robustness of the watermark in real-world screenshot scenarios. Finally, the model undergoes fine-tuning guided by a joint-level distorter to ensure optimal performance. To validate the effectiveness of ScreenMark, we compiled a dataset comprising 100,000 screenshots from various devices and resolutions. Extensive experiments on different datasets confirm the superior robustness, imperceptibility, and practical applicability of the method.
