Table of Contents
Fetching ...

TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks

Lukas Garbas, Max Ploner, Alan Akbik

TL;DR

TransformerRanker tackles efficient PLM selection for downstream classification by ranking models using transferability estimators and layer-aggregation without fine-tuning. It integrates three estimators—kNN, LogME, and H-Score—with last, mean, and best-layer aggregations in a HuggingFace–based PyTorch workflow. Empirical results show strong alignment between the transferability-based rankings and actual fine-tuned performance, particularly with H-Score and layer_mean, and demonstrate practical speedups via dataset downsampling and GPU acceleration. A GermEval18 demonstration illustrates practical gains, highlighting the tool's utility for rapid, cost-effective model selection in NLP pipelines.

Abstract

Classification tasks in NLP are typically addressed by selecting a pre-trained language model (PLM) from a model hub, and fine-tuning it for the task at hand. However, given the very large number of PLMs that are currently available, a practical challenge is to determine which of them will perform best for a specific downstream task. With this paper, we introduce TransformerRanker, a lightweight library that efficiently ranks PLMs for classification tasks without the need for computationally costly fine-tuning. Our library implements current approaches for transferability estimation (LogME, H-Score, kNN), in combination with layer aggregation options, which we empirically showed to yield state-of-the-art rankings of PLMs (Garbas et al., 2024). We designed the interface to be lightweight and easy to use, allowing users to directly connect to the HuggingFace Transformers and Dataset libraries. Users need only select a downstream classification task and a list of PLMs to create a ranking of likely best-suited PLMs for their task. We make TransformerRanker available as a pip-installable open-source library https://github.com/flairNLP/transformer-ranker.

TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks

TL;DR

TransformerRanker tackles efficient PLM selection for downstream classification by ranking models using transferability estimators and layer-aggregation without fine-tuning. It integrates three estimators—kNN, LogME, and H-Score—with last, mean, and best-layer aggregations in a HuggingFace–based PyTorch workflow. Empirical results show strong alignment between the transferability-based rankings and actual fine-tuned performance, particularly with H-Score and layer_mean, and demonstrate practical speedups via dataset downsampling and GPU acceleration. A GermEval18 demonstration illustrates practical gains, highlighting the tool's utility for rapid, cost-effective model selection in NLP pipelines.

Abstract

Classification tasks in NLP are typically addressed by selecting a pre-trained language model (PLM) from a model hub, and fine-tuning it for the task at hand. However, given the very large number of PLMs that are currently available, a practical challenge is to determine which of them will perform best for a specific downstream task. With this paper, we introduce TransformerRanker, a lightweight library that efficiently ranks PLMs for classification tasks without the need for computationally costly fine-tuning. Our library implements current approaches for transferability estimation (LogME, H-Score, kNN), in combination with layer aggregation options, which we empirically showed to yield state-of-the-art rankings of PLMs (Garbas et al., 2024). We designed the interface to be lightweight and easy to use, allowing users to directly connect to the HuggingFace Transformers and Dataset libraries. Users need only select a downstream classification task and a list of PLMs to create a ranking of likely best-suited PLMs for their task. We make TransformerRanker available as a pip-installable open-source library https://github.com/flairNLP/transformer-ranker.
Paper Structure (21 sections, 4 figures, 3 tables)

This paper contains 21 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The three steps of TransformerRanker: (1) The user selects a downstream classification task by selecting a dataset from HuggingFace datasets. (2) The user also selects a list of language models from HuggingFace transformers. (3) Using the selected estimator, the library returns a ranking of which PLMs are likely to perform best on the selected task.
  • Figure 2: A ranking of 20 language models produced by TransformerRanker for the CoNLL-03 shared task data. The output is ordered by rank, with the estimated best-suited model at the top of the list. For each model, the H-score is printed in the third column. Using these results, a user may exclude the lower-ranked models to only focus on the top-ranked models for further exploration.
  • Figure 3: Time taken to estimate a single model. We include the download time (20 seconds), and report runtimes for different downsample splits of the CoNLL-03 dataset. The plot also shows how the ranking correlation changes with different dataset splits. We used the default parameters of layer mean with the h-score estimator. Estimation was done using a batch size of 64 on a single Nvidia A100 (80GB) GPU.
  • Figure 4: GermEval18 ranking result with H-scores for models from the predefined list of smaller PLMs.