A General Framework for Data-Use Auditing of ML Models
Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter
TL;DR
The paper tackles the challenge of verifying whether data-owner content was used to train ML models by introducing a general proactive auditing framework that converts black-box membership inference into a tunable, verifiable test with a controlled false-detection rate. It combines a task-agnostic data-marking scheme with a contrastive membership inference score and a sequential, PPRM-based detection procedure to estimate data-use without full access to training data. The authors demonstrate strong, robust performance on both image classifiers and foundation models (visual encoders, Llama 2, CLIP), outperforming state-of-the-art baselines and showing resilience to several adaptive countermeasures at modest utility costs. The work also discusses practical considerations, including multi-owner scenarios, costs for foundation-model experiments, and pathways toward verifiable unlearning and third-party claim verification.
Abstract
Auditing the use of data in training machine-learning (ML) models is an increasingly pressing challenge, as myriad ML practitioners routinely leverage the effort of content creators to train models without their permission. In this paper, we propose a general method to audit an ML model for the use of a data-owner's data in training, without prior knowledge of the ML task for which the data might be used. Our method leverages any existing black-box membership inference method, together with a sequential hypothesis test of our own design, to detect data use with a quantifiable, tunable false-detection rate. We show the effectiveness of our proposed framework by applying it to audit data use in two types of ML models, namely image classifiers and foundation models.
