Table of Contents
Fetching ...

Measuring Déjà vu Memorization Efficiently

Narine Kokhlikyan, Bargav Jayaraman, Florian Bordes, Chuan Guo, Kamalika Chaudhuri

TL;DR

This work tackles the challenge of measuring memorization in representation learning without retraining large models. It replaces the traditional two-model Déjà Vu setup with lightweight one-model reference strategies to estimate dataset-level correlations for both image representations and vision-language models. Empirical results show that one-model tests closely align with two-model benchmarks in aggregate, enabling principled memorization assessment for open-source models and revealing that OSS models generally memorize less than subset-trained counterparts. The proposed methods offer practical tools for privacy risk evaluation in pre-trained encoders and highlight complementary strengths across reference-model strategies, with code released to facilitate adoption.

Abstract

Recent research has shown that representation learning models may accidentally memorize their training data. For example, the déjà vu method shows that for certain representation learning models and training images, it is sometimes possible to correctly predict the foreground label given only the representation of the background - better than through dataset-level correlations. However, their measurement method requires training two models - one to estimate dataset-level correlations and the other to estimate memorization. This multiple model setup becomes infeasible for large open-source models. In this work, we propose alternative simple methods to estimate dataset-level correlations, and show that these can be used to approximate an off-the-shelf model's memorization ability without any retraining. This enables, for the first time, the measurement of memorization in pre-trained open-source image representation and vision-language representation models. Our results show that different ways of measuring memorization yield very similar aggregate results. We also find that open-source models typically have lower aggregate memorization than similar models trained on a subset of the data. The code is available both for vision and vision language models.

Measuring Déjà vu Memorization Efficiently

TL;DR

This work tackles the challenge of measuring memorization in representation learning without retraining large models. It replaces the traditional two-model Déjà Vu setup with lightweight one-model reference strategies to estimate dataset-level correlations for both image representations and vision-language models. Empirical results show that one-model tests closely align with two-model benchmarks in aggregate, enabling principled memorization assessment for open-source models and revealing that OSS models generally memorize less than subset-trained counterparts. The proposed methods offer practical tools for privacy risk evaluation in pre-trained encoders and highlight complementary strengths across reference-model strategies, with code released to facilitate adoption.

Abstract

Recent research has shown that representation learning models may accidentally memorize their training data. For example, the déjà vu method shows that for certain representation learning models and training images, it is sometimes possible to correctly predict the foreground label given only the representation of the background - better than through dataset-level correlations. However, their measurement method requires training two models - one to estimate dataset-level correlations and the other to estimate memorization. This multiple model setup becomes infeasible for large open-source models. In this work, we propose alternative simple methods to estimate dataset-level correlations, and show that these can be used to approximate an off-the-shelf model's memorization ability without any retraining. This enables, for the first time, the measurement of memorization in pre-trained open-source image representation and vision-language representation models. Our results show that different ways of measuring memorization yield very similar aggregate results. We also find that open-source models typically have lower aggregate memorization than similar models trained on a subset of the data. The code is available both for vision and vision language models.

Paper Structure

This paper contains 35 sections, 6 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Illustration of our one-model déjà vu test for image representation learning. The task is to predict the foreground object given a background crop. The original déjà vu test Dejavu trains two models $\text{SSL}_A$ and $\text{SSL}_B$ on disjoint splits of the training set, and uses $\text{SSL}_B$ to quantify the degree of dataset-level correlation between the foreground and background crop. Our one-model test replaces $\text{SSL}_B$ with a classifier that directly predicts the foreground given background crop, and we show that both ResNet50 network and Naive Bayes classifier work well for this purpose.
  • Figure 2: Left: Population-level correlation accuracy scores across different models. The accuracies for two model tests are based on KNNs computed on top of VICReg, Barlow Twins and DINO representations. ResNet50 and Naive Bayes classifier are used for one model tests. The results show that ResNet50 and NB Top-2 are similar to both VICReg and Barlow Twins. Right: Corresponding Top-5 predicted dataset-level correlation classes and the percentage of per class correlated examples.
  • Figure 3: Left: Pairwise sample-level agreement in measuring dataset-level correlations and Right: Examples demonstrating when one model tests (Resnet and Naive Bayes classifiers) succeed and two model tests (KNN) fail and vice versa. One model tests learn the correlations between foreground and background better since it is enforced by the classifier training, however, they are less accurate when the relationships between foreground and background are ambiguous. One model tests, in contrast, are better at disambiguating the foreground and background relationships. They, however, sometimes tend to predict what's on the background and not what foreground it is associated with.
  • Figure 4: Pairwise sample-level agreement (using Jaccard similarity for predicting correct objects) between the reference VLM $f_B$ in previous two-model test and the GTE language model $g$. The heatmap shows that the agreement fraction for one model and two model tests are comparable.
  • Figure 5: Comparison of overall and Top 20% most confident Déjà vu (DV) scores using one model (ResNet Classifier, Naive Bayes w/ Top-k Crop Annotations (CA)) and two model (KNN Classifier) tests for VICReg, Barlow Twins and DINO trained on a 300k subset of ImageNet.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Definition 1: Déjà vu Memorization
  • Definition 2: Stability-based Memorization