Digital Fingerprinting on Multimedia: A Survey
Wendi Chen, Wensheng Gan, Philip S. Yu
TL;DR
This survey comprehensively maps multimedia digital fingerprinting, highlighting perceptual hashing as compact signatures for content identification, authentication, and management across text, image, video, and audio. It covers traditional hash concepts, granularity, and similarity metrics, and then details unimodal fingerprinting methods with emphasis on text, image, video, and audio modalities. A substantial portion is devoted to deep hashing, illustrating end-to-end learned hash codes across modalities and cross-modal approaches. The paper also surveys diverse applications—from content retrieval and broadcast monitoring to CSS and large-model IP protection—and discusses major challenges and future opportunities, including privacy concerns, composite/adversarial attacks, and the need for scalable, efficient solutions in real-world deployments.
Abstract
The explosive growth of multimedia content in the digital economy era has brought challenges in content recognition, copyright protection, and data management. As an emerging content management technology, perceptual hash-based digital fingerprints, serving as compact summaries of multimedia content, have been widely adopted for efficient multimedia content identification and retrieval across different modalities (e.g., text, image, video, audio), attracting significant attention from both academia and industry. Despite the increasing applications of digital fingerprints, there is a lack of systematic and comprehensive literature review on multimedia digital fingerprints. This survey aims to fill this gap and provide an important resource for researchers studying the details and related advancements of multimedia digital fingerprints. The survey first introduces the definition, characteristics, and related concepts (including hash functions, granularity, similarity measures, etc.) of digital fingerprints. It then focuses on analyzing and summarizing the algorithms for extracting unimodal fingerprints of different types of digital content, including text fingerprints, image fingerprints, video fingerprints, and audio fingerprints. Particularly, it provides an in-depth review and summary of deep learning-based fingerprints. Additionally, the survey elaborates on the various practical applications of digital fingerprints and outlines the challenges and potential future research directions. The goal is to promote the continued development of multimedia digital fingerprint research.
