A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models
Sebastian G. Gruber, Florian Buettner
TL;DR
This work introduces a bias-variance-covariance decomposition for kernel scores to analyze generalization and uncertainty in generative models. It defines distributional variance $\operatorname{Var}_k(P)$ and distributional covariance $\operatorname{Cov}_k(P,Q)$ and shows how the expected kernel score decomposes as $\mathbb{E}[S_k(\hat{P},Y)] = -\lVert Q\rVert_k^2 + \lVert \mathbb{E}[\hat{P}] - Q \rVert_k^2 + Var_k(\hat{P})$, with an ensemble adding a covariance term. The authors provide unbiased, consistent estimators $\widehat{\operatorname{Var}}_k^{(n,m)}$ and $\widehat{\operatorname{Cov}}_k^{(n,m)}$ that rely only on samples, enabling BVCD analysis for both open and closed-source models. Empirically, kernel entropy demonstrates strong predictive power for generalization in image and audio tasks and outperforms baselines in uncertainty estimation for NLP question answering on CoQA and TriviaQA. The framework offers a transferable, kernel-based approach to quantify uncertainty in diverse generative settings and provides practical guidance on kernel choice and sample requirements.
Abstract
Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc and task-dependent manner. For example, natural language approaches cannot be transferred to image generation. In this paper, we introduce the first bias-variance-covariance decomposition for kernel scores. This decomposition represents a theoretical framework from which we derive a kernel-based variance and entropy for uncertainty estimation. We propose unbiased and consistent estimators for each quantity which only require generated samples but not the underlying model itself. Based on the wide applicability of kernels, we demonstrate our framework via generalization and uncertainty experiments for image, audio, and language generation. Specifically, kernel entropy for uncertainty estimation is more predictive of performance on CoQA and TriviaQA question answering datasets than existing baselines and can also be applied to closed-source models.
