Statistical inference on black-box generative models in the data kernel perspective space
Hayden Helm, Aranyak Acharyya, Brandon Duderstadt, Youngser Park, Carey E. Priebe
TL;DR
The paper tackles statistical inference across populations of black-box generative models when model covariates are unavailable. It introduces the Data Kernel Perspective Space (DKPS), which builds low-dimensional model representations from embeddings of responses to a set of queries and then applies multidimensional scaling to map models into Euclidean space, enabling model-level tasks. The authors prove consistency results showing risk based on proxies converges to the oracle risk as the numbers of queries $m$, replicates $r$, and models $n$ grow, and they demonstrate empirically that DKPS supports tasks such as detecting sensitive information leakage and predicting toxicity/bias with competitive accuracy and faster inference than API-based scoring. This framework offers a scalable auditing tool for large model populations, with practical guidance on query design and the choice of distance metrics.
Abstract
Generative models are capable of producing human-expert level content across a variety of topics and domains. As the impact of generative models grows, it is necessary to develop statistical methods to understand collections of available models. These methods are particularly important in settings where the user may not have access to information related to a model's pre-training data, weights, or other relevant model-level covariates. In this paper we extend recent results on representations of black-box generative models to model-level statistical inference tasks. We demonstrate that the model-level representations are effective for multiple inference tasks.
