Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs
Jinmin Li, Kuofeng Gao, Yang Bai, Jingyun Zhang, Shu-Tao Xia
TL;DR
The paper addresses the risk of unauthorized video annotations by video-based LLMs and introduces Flow-based Video Watermarking, which applies imperceptible perturbations $\\Delta$ on a sparse set of frames selected via a flow-based mask $\\mathbf{M}_f$, guided by multi-modal losses to preserve viewing while degrading LLM comprehension. It jointly optimizes video-feature consistency $\\ell_{video}$ and LLM hidden-state consistency $\\ell_{LLM}$ under a constraint on $\\Delta$ using a flow-aware objective that emphasizes key frames. The approach demonstrates that watermarks on less than 20% of frames can significantly reduce CLIP scores, BLEU/ROUGE-CIDEr metrics, and GPT-3.5/4 accuracies across ActivityNet-200 and MSVD-QA, outperforming baseline perturbations and transferring to black-box settings. This work provides a practical defense for video data privacy in multi-modal AI, with implications for safeguarding content against misuse by video-based LLMs.
Abstract
The advent of video-based Large Language Models (LLMs) has significantly enhanced video understanding. However, it has also raised some safety concerns regarding data protection, as videos can be more easily annotated, even without authorization. This paper introduces Video Watermarking, a novel technique to protect videos from unauthorized annotations by such video-based LLMs, especially concerning the video content and description, in response to specific queries. By imperceptibly embedding watermarks into key video frames with multi-modal flow-based losses, our method preserves the viewing experience while preventing misuse by video-based LLMs. Extensive experiments show that Video Watermarking significantly reduces the comprehensibility of videos with various video-based LLMs, demonstrating both stealth and robustness. In essence, our method provides a solution for securing video content, ensuring its integrity and confidentiality in the face of evolving video-based LLMs technologies.
