Comparing Foundation Models using Data Kernels
Brandon Duderstadt, Hayden S. Helm, Carey E. Priebe
TL;DR
This work addresses the challenge of comparing foundation models without committing to a single downstream metric by focusing on the geometry of embedding spaces. It builds a data-kernel $A = \text{TOP}_k(YY^{\top})$ and models it as a Random Dot Product Graph $A \sim \text{RDPG}(ZZ^{\top})$, enabling consistent latent-position estimation via adjacency spectral embedding up to orthogonal transformations. A joint omnibus embedding aligns multiple data kernels to enable per-datum hypothesis testing through bootstrap, and an ablation study demonstrates its capacity to surface representation changes due to data interventions. Extending to population-level analysis, the paper defines a model-manifold distance based on aligned latent positions, showing that manifold distance correlates with downstream metrics like classifier agreement and pseudo-perplexity, thereby supporting a taxonomic view of foundation-model families and suggesting avenues for model selection and privacy-aware analysis.
Abstract
Recent advances in self-supervised learning and neural network scaling have enabled the creation of large models, known as foundation models, which can be easily adapted to a wide range of downstream tasks. The current paradigm for comparing foundation models involves evaluating them with aggregate metrics on various benchmark datasets. This method of model comparison is heavily dependent on the chosen evaluation metric, which makes it unsuitable for situations where the ideal metric is either not obvious or unavailable. In this work, we present a methodology for directly comparing the embedding space geometry of foundation models, which facilitates model comparison without the need for an explicit evaluation metric. Our methodology is grounded in random graph theory and enables valid hypothesis testing of embedding similarity on a per-datum basis. Further, we demonstrate how our methodology can be extended to facilitate population level model comparison. In particular, we show how our framework can induce a manifold of models equipped with a distance function that correlates strongly with several downstream metrics. We remark on the utility of this population level model comparison as a first step towards a taxonomic science of foundation models.
