Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs
Gonzalo Mancera, Daniel DeAlcala, Julian Fierrez, Ruben Tolosana, Aythami Morales
TL;DR
This work applies a gradient-based Membership Inference Test (gMINT) to language models, aiming to determine whether specific text samples were used in training. By training a 3-layer auditing model on the gradients $∇w$ of the audited LLMs, the approach distinguishes training data $D$ from external data $E$ across seven Transformer models and six NLP datasets, totaling about $2.5$ million sentences. Results show AUC values ranging from $0.70$ to $0.99$, with higher performance when more auditing data or larger models are used, and generally stronger performance in mixed-domain settings. This demonstrates gMINT as a scalable tool for auditing data usage in NLP, supporting transparency, data protection, and ethical deployment of AI systems, while outlining practical limitations and avenues for future work in broader tasks and generative models.
Abstract
This work adapts and studies the gradient-based Membership Inference Test (gMINT) to the classification of text based on LLMs. MINT is a general approach intended to determine if given data was used for training machine learning models, and this work focuses on its application to the domain of Natural Language Processing. Using gradient-based analysis, the MINT model identifies whether particular data samples were included during the language model training phase, addressing growing concerns about data privacy in machine learning. The method was evaluated in seven Transformer-based models and six datasets comprising over 2.5 million sentences, focusing on text classification tasks. Experimental results demonstrate MINTs robustness, achieving AUC scores between 85% and 99%, depending on data size and model architecture. These findings highlight MINTs potential as a scalable and reliable tool for auditing machine learning models, ensuring transparency, safeguarding sensitive data, and fostering ethical compliance in the deployment of AI/NLP technologies.
