Table of Contents
Fetching ...

Enabling Small Models for Zero-Shot Selection and Reuse through Model Label Learning

Jia Zhang, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li

TL;DR

This work presents Model Label Learning (MLL), a hub-based framework that endows task-specific expert models with zero-shot capabilities by labeling models with functional semantics via a Semantic Directed Acyclic Graph (SDAG) and selecting and ensembling them with Classification Heads Combination Optimization (CHCO). Unlike large vision-language models, MLL emphasizes a scalable, lower-cost approach that grows zero-shot ability as the model hub expands, validated across seven real-world datasets. The method comprises three steps—model labelling, model selection, and model reuse—with an interim patch to leverage a general VLM when coverage is incomplete. Experimental results demonstrate that expert models can be effectively reused for zero-shot classification, and CHCO consistently improves reuse efficiency and performance as the hub scales, supporting the viability of building scalable, reusable model portfolios for zero-shot tasks.

Abstract

Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot ability is an important research direction. In this paper, we attempt to demonstrate that by constructing a model hub and aligning models with their functionalities using model labels, new tasks can be solved in a zero-shot manner by effectively selecting and reusing models in the hub. We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities through a Semantic Directed Acyclic Graph (SDAG) and leverages an algorithm, Classification Head Combination Optimization (CHCO), to select capable models for new tasks. Compared with the foundation model paradigm, it is less costly and more scalable, i.e., the zero-shot ability grows with the sizes of the model hub. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL, demonstrating that expert models can be effectively reused for zero-shot tasks. Our code will be released publicly.

Enabling Small Models for Zero-Shot Selection and Reuse through Model Label Learning

TL;DR

This work presents Model Label Learning (MLL), a hub-based framework that endows task-specific expert models with zero-shot capabilities by labeling models with functional semantics via a Semantic Directed Acyclic Graph (SDAG) and selecting and ensembling them with Classification Heads Combination Optimization (CHCO). Unlike large vision-language models, MLL emphasizes a scalable, lower-cost approach that grows zero-shot ability as the model hub expands, validated across seven real-world datasets. The method comprises three steps—model labelling, model selection, and model reuse—with an interim patch to leverage a general VLM when coverage is incomplete. Experimental results demonstrate that expert models can be effectively reused for zero-shot classification, and CHCO consistently improves reuse efficiency and performance as the hub scales, supporting the viability of building scalable, reusable model portfolios for zero-shot tasks.

Abstract

Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot ability is an important research direction. In this paper, we attempt to demonstrate that by constructing a model hub and aligning models with their functionalities using model labels, new tasks can be solved in a zero-shot manner by effectively selecting and reusing models in the hub. We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities through a Semantic Directed Acyclic Graph (SDAG) and leverages an algorithm, Classification Head Combination Optimization (CHCO), to select capable models for new tasks. Compared with the foundation model paradigm, it is less costly and more scalable, i.e., the zero-shot ability grows with the sizes of the model hub. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL, demonstrating that expert models can be effectively reused for zero-shot tasks. Our code will be released publicly.
Paper Structure (21 sections, 13 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 13 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The comparison between VLMs and MLL. MLL aligns models in the hub with their functionalities using model labels to identify capable expert models for tasks.
  • Figure 2: The overview framework of the MLL paradigm. Models submitted to the hub undergo pre-testing to receive model labels that describe their functionalities in the labelling step. When a user's downstream task arises, the proposal selects useful experts in the selection step and assembles them to handle the task in a zero-shot manner.
  • Figure 3: The accuracy and the number of models used vary with the reuse budget $k$. Ours, LogMEMLL, and LEEPMLL are depicted in black, blue, and orange, respectively, with the CLIP baseline in the grey dashed line. Our proposal exhibits robust and superior performance, utilizing fewer expert models than candidate methods, which indicates higher reuse efficiency.
  • Figure 4: The performance variation with the scaling of the model hub. Our method starts from the general baseline and steadily improves as the hub is continuously enhanced.