Table of Contents
Fetching ...

MetaRank: Task-Aware Metric Selection for Model Transferability Estimation

Yuhang Liu, Wenjie Zhao, Yunhui Guo

TL;DR

Transferability estimation relies on proxy metrics to rank pretrained source models, but no single MTE metric is universally optimal across target datasets. MetaRank treats metric selection as a learning-to-rank problem and uses language-model embeddings to map dataset and metric descriptions into a shared semantic space, training an offline meta-predictor with a listwise objective to predict metric rankings. In online use, MetaRank quickly ranks candidate MTE metrics for a new dataset, enabling a priori selection of the most suitable metric. Across 11 source models and 11 target datasets, MetaRank achieves the best average ranking and demonstrates task-aware, robust metric selection that reduces the risk of poor transferability estimation outcomes.

Abstract

Selecting an appropriate pre-trained source model is a critical, yet computationally expensive, task in transfer learning. Model Transferability Estimation (MTE) methods address this by providing efficient proxy metrics to rank models without full fine-tuning. In practice, the choice of which MTE metric to use is often ad hoc or guided simply by a metric's average historical performance. However, we observe that the effectiveness of MTE metrics is highly task-dependent and no single metric is universally optimal across all target datasets. To address this gap, we introduce MetaRank, a meta-learning framework for automatic, task-aware MTE metric selection. We formulate metric selection as a learning-to-rank problem. Rather than relying on conventional meta-features, MetaRank encodes textual descriptions of both datasets and MTE metrics using a pretrained language model, embedding them into a shared semantic space. A meta-predictor is then trained offline on diverse meta-tasks to learn the intricate relationship between dataset characteristics and metric mechanisms, optimized with a listwise objective that prioritizes correctly ranking the top-performing metrics. During the subsequent online phase, MetaRank efficiently ranks the candidate MTE metrics for a new, unseen target dataset based on its textual description, enabling practitioners to select the most appropriate metric a priori. Extensive experiments across 11 pretrained models and 11 target datasets demonstrate the strong effectiveness of our approach.

MetaRank: Task-Aware Metric Selection for Model Transferability Estimation

TL;DR

Transferability estimation relies on proxy metrics to rank pretrained source models, but no single MTE metric is universally optimal across target datasets. MetaRank treats metric selection as a learning-to-rank problem and uses language-model embeddings to map dataset and metric descriptions into a shared semantic space, training an offline meta-predictor with a listwise objective to predict metric rankings. In online use, MetaRank quickly ranks candidate MTE metrics for a new dataset, enabling a priori selection of the most suitable metric. Across 11 source models and 11 target datasets, MetaRank achieves the best average ranking and demonstrates task-aware, robust metric selection that reduces the risk of poor transferability estimation outcomes.

Abstract

Selecting an appropriate pre-trained source model is a critical, yet computationally expensive, task in transfer learning. Model Transferability Estimation (MTE) methods address this by providing efficient proxy metrics to rank models without full fine-tuning. In practice, the choice of which MTE metric to use is often ad hoc or guided simply by a metric's average historical performance. However, we observe that the effectiveness of MTE metrics is highly task-dependent and no single metric is universally optimal across all target datasets. To address this gap, we introduce MetaRank, a meta-learning framework for automatic, task-aware MTE metric selection. We formulate metric selection as a learning-to-rank problem. Rather than relying on conventional meta-features, MetaRank encodes textual descriptions of both datasets and MTE metrics using a pretrained language model, embedding them into a shared semantic space. A meta-predictor is then trained offline on diverse meta-tasks to learn the intricate relationship between dataset characteristics and metric mechanisms, optimized with a listwise objective that prioritizes correctly ranking the top-performing metrics. During the subsequent online phase, MetaRank efficiently ranks the candidate MTE metrics for a new, unseen target dataset based on its textual description, enabling practitioners to select the most appropriate metric a priori. Extensive experiments across 11 pretrained models and 11 target datasets demonstrate the strong effectiveness of our approach.

Paper Structure

This paper contains 21 sections, 3 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison of ad-hoc vs. task-aware metric selection. Ad-hoc selection employs a single MTE metric (e.g., NCTI), which may achieve the highest average performance but fails to identify the optimal metric for each target dataset. In contrast, task-aware selection adapts the metric choice per dataset (e.g., SFDA for Aircraft and H-Score for Cars), enabling per-dataset optimal metric identification and improved transferability estimation.
  • Figure 2: Performance evaluation of MTE metrics across diverse target datasets. The heatmap visualizes the weighted Kendall rank correlation coefficient ($\tau_{w}$) between each metric's prediction and the ground-truth fine-tuned performance, with per-dataset best cells outlined in black. The results demonstrate that no single metric is universally optimal.
  • Figure 3: Overview of the proposed MetaRank for MTE metric selection. Offline training learns a meta-predictor $f_\theta$ from dataset–metric representations and ground-truth rankings; online testing ranks candidate metrics for a new target dataset.
  • Figure 4: Average Rank of Different Selection Methods on All Datasets. Lower rank, positioned to the left, indicates better performance. Our proposed MetaRank outperforms both ad-hoc selection and other meta-learners.
  • Figure 5: Rank Distribution of Different Selection Methods. The orange line inside each box is the median; the box edges are the first quartile and the third quartile; whiskers extend to the most extreme points within 1.5 × interquartile range (IQR); the green triangle marks the mean; the colored circles show per-dataset ranks with slight horizontal jitter; and hollow circles denote outliers beyond the whiskers.
  • ...and 6 more figures