Statistical inference on black-box generative models in the data kernel perspective space

Hayden Helm; Aranyak Acharyya; Brandon Duderstadt; Youngser Park; Carey E. Priebe

Statistical inference on black-box generative models in the data kernel perspective space

Hayden Helm, Aranyak Acharyya, Brandon Duderstadt, Youngser Park, Carey E. Priebe

TL;DR

The paper tackles statistical inference across populations of black-box generative models when model covariates are unavailable. It introduces the Data Kernel Perspective Space (DKPS), which builds low-dimensional model representations from embeddings of responses to a set of queries and then applies multidimensional scaling to map models into Euclidean space, enabling model-level tasks. The authors prove consistency results showing risk based on proxies converges to the oracle risk as the numbers of queries $m$, replicates $r$, and models $n$ grow, and they demonstrate empirically that DKPS supports tasks such as detecting sensitive information leakage and predicting toxicity/bias with competitive accuracy and faster inference than API-based scoring. This framework offers a scalable auditing tool for large model populations, with practical guidance on query design and the choice of distance metrics.

Abstract

Generative models are capable of producing human-expert level content across a variety of topics and domains. As the impact of generative models grows, it is necessary to develop statistical methods to understand collections of available models. These methods are particularly important in settings where the user may not have access to information related to a model's pre-training data, weights, or other relevant model-level covariates. In this paper we extend recent results on representations of black-box generative models to model-level statistical inference tasks. We demonstrate that the model-level representations are effective for multiple inference tasks.

Statistical inference on black-box generative models in the data kernel perspective space

TL;DR

, replicates

, and models

grow, and they demonstrate empirically that DKPS supports tasks such as detecting sensitive information leakage and predicting toxicity/bias with competitive accuracy and faster inference than API-based scoring. This framework offers a scalable auditing tool for large model populations, with practical guidance on query design and the choice of distance metrics.

Abstract

Paper Structure (18 sections, 2 theorems, 18 equations, 5 figures, 2 tables)

This paper contains 18 sections, 2 theorems, 18 equations, 5 figures, 2 tables.

Introduction
Background & related work
The Data Kernel Perspective Space
Analytical properties of the DKPS
Statistical inference in the DKPS
Non-standard considerations
An illustrative example -- "Was RA Fisher great?"
Experiments
Has a model seen sensitive information?
How safe is a model?
Discussion
Limitations
Proofs of Theorems 1 & 2
Proof of Theorem 1
Proof of Theorem 2.
...and 3 more sections

Key Result

Theorem 1

Under technical assumptions described in Appendix app:proofs, as $m, r \to \infty$, for every $n$.

Figures (5)

Figure 1: Left. The 2-d Data Kernel Perspective Space (DKPS) and covariate surface for a collection of 550 models parameterized by fixed augmentations. Right. The performance of the 1-nearest neighbor regressor in DKPS for predicting the probability that an unlabeled model responds "yes" to "Was RA Fisher great?".
Figure 2: Left. The 2-d data kernel perspective space (DKPS) of 50 fine-tuned models -- 25 with "sensitive" data in the fine-tuning data mixture (red), 25 with none (black) -- induced by an evaluation set containing 10 prompts relevant to the sensitive data. For models trained on sensitive data, color intensity correlates with amount of sensitive data in the training mixture. Center. The 2-d DKPS of the models induced by a set of 10 prompts "orthogonal" to the difference between models with sensitive data in their fine-tuning data mixture and models with no sensitive data in their fine-tuning data mixture. Right. Classification performance as a function of number of labeled models and size of evaluation set for both sensitive and orthogonal evaluation sets.
Figure 3: Top. The 1-d FLD projection of the models from a DKPS induced by queries from the sensitive topic versus the amount of sensitive data the adapter had access to during training. Bottom. The same but for a DKPS induced by queries from orthogonal topics.
Figure 4: Left. A graph where each node is a model and an edge between two models exists if model $i$ is fine-tuned from model $i'$ or if model $i'$s weights were used in a model-merge that resulted in model $i$, etc. Right, top. The two-dimensional data kernel perspective spaces (DKPS) corresponding to the toxicity and bias prediction tasks. Dot size is proportional to model toxicity or bias. Right, bottom. Relative performance of three regression techniques.
Figure 5: The relative time improvement (larger is better) when using local predictions in DKPS instead of calculating the ground-truth model-level covariate using HuggingFace's API.

Theorems & Definitions (2)

Theorem 1
Theorem 2

Statistical inference on black-box generative models in the data kernel perspective space

TL;DR

Abstract

Statistical inference on black-box generative models in the data kernel perspective space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)