Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Prince Aboagye; Yan Zheng; Junpeng Wang; Uday Singh Saini; Xin Dai; Michael Yeh; Yujie Fan; Zhongfang Zhuang; Shubham Jain; Liang Wang; Wei Zhang

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Prince Aboagye, Yan Zheng, Junpeng Wang, Uday Singh Saini, Xin Dai, Michael Yeh, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Liang Wang, Wei Zhang

TL;DR

The paper tackles the challenge of efficiently evaluating pretrained models without exhaustive downstream fine-tuning by introducing a posterior-based metric that quantifies the consistency between entity embeddings and their associated meta-features. The core idea is to partition the embedding space using meta-features (via single-feature clustering, EmbeddingTree segmentation, or MAP-guided 2-GMM splits) and measure the average log posterior (ALP) of embeddings to their clusters, with robustness mechanisms like clipping and multi-head subspaces. Key contributions include the ALP metric, EmbeddingTree-based segmentation, and cross-model embedding comparison under identical splits, demonstrated across synthetic data, MovieLens relational data, Llama-2, and CLIP BREEDS, showing strong alignment with downstream performance. This work offers a scalable, model-agnostic tool for rapid model ranking and benchmarking across modalities, enabling efficient model selection in practice.

Abstract

The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 16 figures, 14 tables, 1 algorithm)

This paper contains 21 sections, 12 equations, 16 figures, 14 tables, 1 algorithm.

Introduction
Related Work
Pre-trained models
Algorithm Framework
Proposed Method: Posterior Based Embedding Evaluating Metric
One Meta Feature Based Clustering
Meta features + representation based segmentation
2-GMM Splitting with Maximum A Posteriori Estimation (MAP)
Finding the Best Splitting Point
Embedding Comparison based on the same splitting criteria
Experimental Analysis
Synthetic Dataset: Gaussian Mixture Model (GMM) of Ten Gaussian Distributions
Moive Lens Dataset for Relational
Movie Lens Dataset: Clustering by Year
Movie Lens Dataset: Clustering by Genre
...and 6 more sections

Figures (16)

Figure 1: Illustration on a 2D synthetic dataset consisting of 10 Gaussian distributions that are perfectly separated, partially overlapping, and perfectly overlapping.
Figure 2: Mean of the average of the log posterior and accuracy on the MovieLens dataset by clustering on year.
Figure 3: Mean of the average of the log posterior and accuracy on the MovieLens dataset by clustering on the genre.
Figure 4: Mean of the average of the log posterior and accuracy on the MovieLens dataset by clustering with tree leaf nodes.
Figure 5: Embedding quality over model layers (average of the log posterior). The dimensions are divided into subsets, each comprising 128 dimensions.
...and 11 more figures

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

TL;DR

Abstract

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (16)