Table of Contents
Fetching ...

Is My Data in Your AI? Membership Inference Test (MINT) applied to Face Biometrics

Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia

TL;DR

The paper presents the Membership Inference Test (MINT), an auditing framework to determine if a specific data sample was used to train an AI model, demonstrated on state-of-the-art Face Recognition systems. It proposes two architectures, Vanilla MINT (MLP with per-channel activation pooling) and CNN MINT (CNN over activation maps), which leverage Auxiliary Auditable Data (AAD) and model embeddings to distinguish training versus external data. Evaluations across three FR models and six databases show up to $90\%$ accuracy, outperforming adapted MIAs and highlighting the potential for privacy and regulatory compliance in AI systems. The work also discusses practical deployment challenges, legal implications, and future directions including gradients and unsupervised extensions across domains beyond face data.

Abstract

This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. Specifically, we propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. These architectures are based on Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The experimental framework focuses on the challenging task of Face Recognition, considering three state-of-the-art Face Recognition systems. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Different experimental scenarios are considered depending on the context of the AI model to test. Our proposed MINT approach achieves promising results, with up to 90\% accuracy, indicating the potential to recognize if an AI model has been trained with specific data. The proposed MINT approach can serve to enforce privacy and fairness in several AI applications, e.g., revealing if sensitive or private data was used for training or tuning Large Language Models (LLMs).

Is My Data in Your AI? Membership Inference Test (MINT) applied to Face Biometrics

TL;DR

The paper presents the Membership Inference Test (MINT), an auditing framework to determine if a specific data sample was used to train an AI model, demonstrated on state-of-the-art Face Recognition systems. It proposes two architectures, Vanilla MINT (MLP with per-channel activation pooling) and CNN MINT (CNN over activation maps), which leverage Auxiliary Auditable Data (AAD) and model embeddings to distinguish training versus external data. Evaluations across three FR models and six databases show up to accuracy, outperforming adapted MIAs and highlighting the potential for privacy and regulatory compliance in AI systems. The work also discusses practical deployment challenges, legal implications, and future directions including gradients and unsupervised extensions across domains beyond face data.

Abstract

This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. Specifically, we propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. These architectures are based on Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The experimental framework focuses on the challenging task of Face Recognition, considering three state-of-the-art Face Recognition systems. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Different experimental scenarios are considered depending on the context of the AI model to test. Our proposed MINT approach achieves promising results, with up to 90\% accuracy, indicating the potential to recognize if an AI model has been trained with specific data. The proposed MINT approach can serve to enforce privacy and fairness in several AI applications, e.g., revealing if sensitive or private data was used for training or tuning Large Language Models (LLMs).
Paper Structure (24 sections, 5 figures, 9 tables)

This paper contains 24 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: The proposed Membership Inference Test (MINT) model is trained to predict if a given data ($d$) was used during the training process of an AI Model ($M$), trained with a database ($D$).
  • Figure 2: The MINT task is represented graphically. On the left, we show the Audited Model Learned Space, illustrating two classes (0 and 1) each with samples used and not used in the Audited Model training. On the right, we present the feature space we aim to learn with our proposed MINT Model, where previous embeddings represented in the left plot will be projected so they become easily separable for the new binary classification task: used or not used (A or B) in the training of the Audited Model.
  • Figure 3: The Membership Inference Test (MINT) Model ($T$) is trained to predict if a specific data sample ($d$) was used during the training process of an Audited AI/ML Model ($M$), which was previously trained with a database ($\mathcal{D}$). The input of the MINT Model is AAD (e.g., activations maps for data samples $d$) and/or the model outcome obtained from $M$.
  • Figure 4: Learning framework of the Vanilla MINT Model (a) and the CNN MINT Model (b) trained with the AAD obtained from the Convolutional Layer $i$ and/or the model outcome if possible.
  • Figure 5: ROC curves obtained of the different MINT approaches for the FR Model 1 (left), FR Model 2 (center), and FR Model 3 (right). $^{*}$We have adapted the MIA approach used in rezaei2021difficulty using the Face Recognition Model outcome. Note that we used the original model instead of shadow models. $^{**}$We used the Vanilla MINT model based on the combination of all convolutional layers. $^{***}$We used the CNN MINT Model trained with the Conv Layer #1.