DarkHash: A Data-Free Backdoor Attack Against Deep Hashing
Ziqi Zhou, Menghao Deng, Yufei Song, Hangtao Zhang, Wei Wan, Shengshan Hu, Minghui Li, Leo Yu Zhang, Dezhong Yao
TL;DR
DarkHash introduces the first data-free backdoor attack against deep hashing for image retrieval by using a surrogate dataset and a shadow target strategy. It freezes shallow layers and fine-tunes higher layers with a triple objective: preserve benign retrieval ($\mathcal{J}_{ben}$), implant backdoor functionality ($\mathcal{J}_{bac}$), and align poisoned samples with their neighbors toward a shadow target via topological alignment ($\mathcal{J}_{tpa}$). The method achieves high attack performance with $t$-mAP typically exceeding 80% across 120 configurations while maintaining comparable $mAP$ on benign data, and it remains robust under several defenses including fine-tuning, pruning, Neural Cleanse, STRIP, and SentiNet. These results underscore the practicality of data-free backdoors in deep hashing and highlight the need for specialized defenses for retrieval-based models.
Abstract
Benefiting from its superior feature learning capabilities and efficiency, deep hashing has achieved remarkable success in large-scale image retrieval. Recent studies have demonstrated the vulnerability of deep hashing models to backdoor attacks. Although these studies have shown promising attack results, they rely on access to the training dataset to implant the backdoor. In the real world, obtaining such data (e.g., identity information) is often prohibited due to privacy protection and intellectual property concerns. Embedding backdoors into deep hashing models without access to the training data, while maintaining retrieval accuracy for the original task, presents a novel and challenging problem. In this paper, we propose DarkHash, the first data-free backdoor attack against deep hashing. Specifically, we design a novel shadow backdoor attack framework with dual-semantic guidance. It embeds backdoor functionality and maintains original retrieval accuracy by fine-tuning only specific layers of the victim model using a surrogate dataset. We consider leveraging the relationship between individual samples and their neighbors to enhance backdoor attacks during training. By designing a topological alignment loss, we optimize both individual and neighboring poisoned samples toward the target sample, further enhancing the attack capability. Experimental results on four image datasets, five model architectures, and two hashing methods demonstrate the high effectiveness of DarkHash, outperforming existing state-of-the-art backdoor attack methods. Defense experiments show that DarkHash can withstand existing mainstream backdoor defense methods.
