Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing
Shuai Li, Jie Zhang, Yuang Qi, Kejiang Chen, Tianwei Zhang, Weiming Zhang, Nenghai Yu
TL;DR
The paper introduces PADHASH, a data poisoning attack against deep hashing for large-scale image retrieval, showing that clean trigger images can steer retrieval toward malicious targets without altering query inputs. PADHASH trains a surrogate model to imitate the victim, then uses Strict Gradient-Matching to generate poisoned images that, when injected into training, cause the victim model to retrieve target images for clean queries; the attack is demonstrated to be effective, transferable across hash methods and datasets, and stealthy with preserved MAP. Across CIFAR-10, ImageNet100, and MSCOCO, PADHASH achieves high attack success rates at very low poison ratios, and remains feasible in gray-box and black-box settings, highlighting practical security risks in deep hashing systems. The work concludes with defense considerations, including poisoned-data detection, data augmentation, and robust training, and discusses potential applications such as model fingerprinting and copyright verification, emphasizing the need for security-aware design in multimedia retrieval pipelines.
Abstract
Large-scale image retrieval using deep hashing has become increasingly popular due to the exponential growth of image data and the remarkable feature extraction capabilities of deep neural networks (DNNs). However, deep hashing methods are vulnerable to malicious attacks, including adversarial and backdoor attacks. It is worth noting that these attacks typically involve altering the query images, which is not a practical concern in real-world scenarios. In this paper, we point out that even clean query images can be dangerous, inducing malicious target retrieval results, like undesired or illegal images. To the best of our knowledge, we are the first to study data \textbf{p}oisoning \textbf{a}ttacks against \textbf{d}eep \textbf{hash}ing \textbf{(\textit{PADHASH})}. Specifically, we first train a surrogate model to simulate the behavior of the target deep hashing model. Then, a strict gradient matching strategy is proposed to generate the poisoned images. Extensive experiments on different models, datasets, hash methods, and hash code lengths demonstrate the effectiveness and generality of our attack method.
