Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach
Prince Aboagye, Yan Zheng, Junpeng Wang, Uday Singh Saini, Xin Dai, Michael Yeh, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Liang Wang, Wei Zhang
TL;DR
The paper tackles the challenge of efficiently evaluating pretrained models without exhaustive downstream fine-tuning by introducing a posterior-based metric that quantifies the consistency between entity embeddings and their associated meta-features. The core idea is to partition the embedding space using meta-features (via single-feature clustering, EmbeddingTree segmentation, or MAP-guided 2-GMM splits) and measure the average log posterior (ALP) of embeddings to their clusters, with robustness mechanisms like clipping and multi-head subspaces. Key contributions include the ALP metric, EmbeddingTree-based segmentation, and cross-model embedding comparison under identical splits, demonstrated across synthetic data, MovieLens relational data, Llama-2, and CLIP BREEDS, showing strong alignment with downstream performance. This work offers a scalable, model-agnostic tool for rapid model ranking and benchmarking across modalities, enabling efficient model selection in practice.
Abstract
The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
