Fast Training Dataset Attribution via In-Context Learning
Milad Fotouhi, Mohammad Taha Bahadori, Oluwaseyi Feyisetan, Payman Arabshahi, David Heckerman
TL;DR
This work tackles training data attribution (TDA) for instruction-tuned LLMs by leveraging in-context learning and prompt engineering. It introduces two complementary methods: SCM, a non-parametric Shapley-context approach, and CMF, a semi-parametric context mixture model cast as a matrix-factorization problem solved via alternating projected least squares. CMF demonstrates greater robustness to retrieval noise and yields more reliable attribution—capturing base-model contributions and dataset-specific effects without explicit latent distributions. Through extensive experiments on BoolQ, FakeQ, and Olympic2024 across multiple models, the authors show CMF and SCM can quantify dataset influence and evaluate unlearning techniques, with CMF offering favorable runtime and performance. The findings suggest practical impact for data curation, auditing, and robust data-influence assessment in real-world, retrieval-augmented systems.
Abstract
We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.
