Table of Contents
Fetching ...

Does CLIP Know My Face?

Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting

TL;DR

This work introduces Identity Inference Attack (IDIA), a privacy attack for vision-language models like CLIP that tests whether an individual's data were included in training by pairing facial images with candidate name prompts. Using closed-box access, a set of facial images, and a real name, the attack queries CLIP with multiple prompts across thousands of name candidates and aggregates predictions to infer membership, achieving high true-positive rates with very low false positives. Large-scale experiments on LAION-400M and CC3M demonstrate that CLIP memorizes faces and names, with IDIA yielding $\text{TPR}$ above 70–95% depending on dataset, while $\text{FPR}$ stays near zero to under 2%. The results imply a tangible privacy risk in multimodal models trained on web-scale data and motivate using IDIA as a privacy-measure tool and as evidence in data-rights enforcement, while highlighting the need for countermeasures and ethical considerations in deployment of such models.

Abstract

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws.

Does CLIP Know My Face?

TL;DR

This work introduces Identity Inference Attack (IDIA), a privacy attack for vision-language models like CLIP that tests whether an individual's data were included in training by pairing facial images with candidate name prompts. Using closed-box access, a set of facial images, and a real name, the attack queries CLIP with multiple prompts across thousands of name candidates and aggregates predictions to infer membership, achieving high true-positive rates with very low false positives. Large-scale experiments on LAION-400M and CC3M demonstrate that CLIP memorizes faces and names, with IDIA yielding above 70–95% depending on dataset, while stays near zero to under 2%. The results imply a tangible privacy risk in multimodal models trained on web-scale data and motivate using IDIA as a privacy-measure tool and as evidence in data-rights enforcement, while highlighting the need for countermeasures and ethical considerations in deployment of such models.

Abstract

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws.
Paper Structure (22 sections, 4 equations, 10 figures, 2 tables)

This paper contains 22 sections, 4 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Illustration of our Identity Inference Attack (IDIA). True-positive (✓) and true-negative (✗) predictions of individuals to be part of the training data of CLIP. The IDIA was performed on a CLIP model trained on the Conceptual Captions 3M dataset cc3m where each person appeared only 75 times in a dataset with a total of 2.8 million image-text pairs. Images licensed as CC BY 2.0 jimmy_fallonkaley_cuocoben_stilleradam_sandlervalerie_harper
  • Figure 2: Identity Inference Attack Examples (IDIA) of infrequent appearing celebrities. True-positive (✓) and true-negative (✗) predictions of European and American celebrities in the LAION-400M dataset laion400m. The number in parentheses indicates how often the person appeared in the LAION dataset, containing 400 million image-text pairs. The IDIA was performed on a ViT-L/14 CLIP model trained on the LAION-400M dataset open_clip. Images licensed as CC BY 3.0 bernhard_hoeckerbettina_lamprechtguido_cantzmax_giermanncarolin_kebekus and CC BY 2.0 ilene_kristen.
  • Figure 3: Identity Inference Attack (IDIA). Depiction of the workflow of our IDIA. Given different images and the name of a person, CLIP is queried with the images and multiple prompt templates containing possible names. After receiving the results of the queries, for each of the prompts, the name inferred for the majority of images is taken as the predicted name. If the number of correct predictions over all the prompts is greater or equal $\tau$, the person is assumed to be in the training data. (Best Viewed in Color)
  • Figure 4: IDIAs can already be executed with only a few samples. Depicted is the influence of the number of attack samples available to the adversary during the IDIA. The models were trained on the LAION-400M and the CC3M datasets. Plotted are the mean and standard deviation of the true-positive rate (TPR), false-negative rate (FNR), false-positive rate (FPR), true-negative rate (TNR), and accuracy (Acc). Metrics are computed by repeating the IDIA 20 times with a randomly sampled subset of attack samples. For the CC3M models, each individual was present 75 times in the dataset, while for the LAION-400M dataset each individual appeared less than 300 times.
  • Figure 5: IDIAs work even on individuals who appear very few times in the dataset. Depicted are the mean and standard deviation of the true-positive and false-negative rates of the IDIA for different numbers of training images per person. In this experiment, 30 samples were used to perform the attack. Additional plots for the other models can be found in App. \ref{['app:additional_experimental_results']}
  • ...and 5 more figures