Table of Contents
Fetching ...

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash

Lukas Struppek, Dominik Hintersdorf, Daniel Neider, Kristian Kersting

TL;DR

The paper addresses the risk of deploying deep perceptual hashing, exemplified by Apple's NeuralHash, for client-side CSAM detection. It presents a rigorous empirical analysis across four adversarial settings—hash collisions, gradient-based evasion, gradient-free transformations, and hash information leakage—demonstrating substantial vulnerabilities: collision success rates near 90–100\% and robust evasion with minimal perceptual changes, coupled with measurable information leakage from 96-bit hashes. The main contributions include a comprehensive attack taxonomy, quantitative evaluation showing non-robustness, and critical privacy and security implications for client-side scanning. The findings challenge the practicality of NeuralHash as a privacy-preserving, reliable detector on user devices and argue for caution or alternative approaches in real-world deployments. Overall, the work highlights the need for robust, privacy-preserving designs and regulatory considerations when deploying perceptual hashing-based detection systems on end-user devices.

Abstract

Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash

TL;DR

The paper addresses the risk of deploying deep perceptual hashing, exemplified by Apple's NeuralHash, for client-side CSAM detection. It presents a rigorous empirical analysis across four adversarial settings—hash collisions, gradient-based evasion, gradient-free transformations, and hash information leakage—demonstrating substantial vulnerabilities: collision success rates near 90–100\% and robust evasion with minimal perceptual changes, coupled with measurable information leakage from 96-bit hashes. The main contributions include a comprehensive attack taxonomy, quantitative evaluation showing non-robustness, and critical privacy and security implications for client-side scanning. The findings challenge the practicality of NeuralHash as a privacy-preserving, reliable detector on user devices and argue for caution or alternative approaches in real-world deployments. Overall, the work highlights the need for robust, privacy-preserving designs and regulatory considerations when deploying perceptual hashing-based detection systems on end-user devices.

Abstract

Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.

Paper Structure

This paper contains 23 sections, 5 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: NeuralHash pipeline as deployed on user devices. The pipeline consists of an embedding network and a locality-sensitive hashing (LSH) step. The embedding network maps the preprocessed images into an abstract feature representation vector. LSH then maps each vector into a specific bucket by checking its position relative to the hyperplanes defined in the hashing matrix.
  • Figure 2: Locality-sensitive hashing (LSH) scheme. Each hyperplane divides the space into two parts, with a bit state {0, 1} assigned to each side. The polytopes constrained by the hyperplanes are called buckets. Each bucket is assigned a unique binary hash code based on its relative position to each hyperplane. All data points in the same bucket are assigned the same hash code.
  • Figure 3: We manipulated the original image wikimediaProtest to have the same hash as the target image (Adversary 1). The manipulated image is visually hardly distinguishable from the original since the (normalized) differences are small. Still, the manipulated image is assigned the same hash as the visually completely different target image. This demonstrates the practicability and danger of hash collision attacks.
  • Figure 4: Additional collision results with their SSIM values between the original images (left) and the manipulated ones (right).
  • Figure 5: Visualization of our gradient-based evasion attacks (Adversary 2). The added perturbations are hardly visible, even changing a single pixel leads to a hash change. Normalized differences are visualized below. Here, black marks image parts that did not have been modified. Zoom in for visual details.
  • ...and 11 more figures