Table of Contents
Fetching ...

Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights

Jonathan Kahana, Or Nathan, Eliahu Horwitz, Yedid Hoshen

TL;DR

The paper tackles the problem of finding classifiers that recognize a target concept in large public model repositories without access to training data or metadata. It introduces ProbeLog, a logit-level descriptor derived by probing fixed inputs and normalizing logit responses, and extends it to zero-shot text-based search via CLIP-like alignment. A discrepancy metric tailored for logit descriptors and a collaborative probing strategy using matrix factorization enable scalable gallery encoding with reduced computational cost. Empirical results on INet-Hub and HF-Hub show high retrieval accuracy for both in-distribution and cross-domain queries, with substantial gains over baselines and practical reductions in probe requirements. This approach offers a principled, scalable solution for practical model search that can reduce training time, cost, and environmental impact while improving access to task-specific pretrained classifiers.

Abstract

With the increasing numbers of publicly available models, there are probably pretrained, online models for most tasks users require. However, current model search methods are rudimentary, essentially a text-based search in the documentation, thus users cannot find the relevant models. This paper presents ProbeLog, a method for retrieving classification models that can recognize a target concept, such as "Dog", without access to model metadata or training data. Differently from previous probing methods, ProbeLog computes a descriptor for each output dimension (logit) of each model, by observing its responses on a fixed set of inputs (probes). Our method supports both logit-based retrieval ("find more logits like this") and zero-shot, text-based retrieval ("find all logits corresponding to dogs"). As probing-based representations require multiple costly feedforward passes through the model, we develop a method, based on collaborative filtering, that reduces the cost of encoding repositories by 3x. We demonstrate that ProbeLog achieves high retrieval accuracy, both in real-world and fine-grained search tasks and is scalable to full-size repositories.

Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights

TL;DR

The paper tackles the problem of finding classifiers that recognize a target concept in large public model repositories without access to training data or metadata. It introduces ProbeLog, a logit-level descriptor derived by probing fixed inputs and normalizing logit responses, and extends it to zero-shot text-based search via CLIP-like alignment. A discrepancy metric tailored for logit descriptors and a collaborative probing strategy using matrix factorization enable scalable gallery encoding with reduced computational cost. Empirical results on INet-Hub and HF-Hub show high retrieval accuracy for both in-distribution and cross-domain queries, with substantial gains over baselines and practical reductions in probe requirements. This approach offers a principled, scalable solution for practical model search that can reduce training time, cost, and environmental impact while improving access to task-specific pretrained classifiers.

Abstract

With the increasing numbers of publicly available models, there are probably pretrained, online models for most tasks users require. However, current model search methods are rudimentary, essentially a text-based search in the documentation, thus users cannot find the relevant models. This paper presents ProbeLog, a method for retrieving classification models that can recognize a target concept, such as "Dog", without access to model metadata or training data. Differently from previous probing methods, ProbeLog computes a descriptor for each output dimension (logit) of each model, by observing its responses on a fixed set of inputs (probes). Our method supports both logit-based retrieval ("find more logits like this") and zero-shot, text-based retrieval ("find all logits corresponding to dogs"). As probing-based representations require multiple costly feedforward passes through the model, we develop a method, based on collaborative filtering, that reduces the cost of encoding repositories by 3x. We demonstrate that ProbeLog achieves high retrieval accuracy, both in real-world and fine-grained search tasks and is scalable to full-size repositories.

Paper Structure

This paper contains 32 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Hugging Face Documentation. We analyze the model cards of $1.2M$ Hugging Face models. We discover that the majority of models are either undocumented or poorly documented.
  • Figure 2: Classification Model Search. We present a new task of Classification Model Search, where the goal is to find classifiers that can recognize a target concept. Concretely, given an input prompt, such as "Dog", we wish to retrieve all classifiers that one of their classes is "Dog". The search space is a large model repository, that contains many models and concepts to search from. The retrieved models can replace model training, increasing accuracy, reducing cost and environmental impact.
  • Figure 3: ProbeLog Descriptors. Our method generates a descriptor for individual output dimensions (logits) of models. First, we sample and a set of inputs (e.g., from the COCO dataset), and fix them as our set of probes. Then, to create a new ProbeLog descriptor for a model logit, we feed the set of ordered probes nto the model and observe their outputs. Finally, we take all values of the logit we wish to represent, and normalize them. We use this representation to accurately retrieve model logits associated with similar concepts. In Fig. \ref{['fig:text_probelog']}, we extend this idea to zero-shot concept descriptors.
  • Figure 4: CIFAR10 Logit Similarities.(a) Ground truth label. (b) ProbeLog representations using $1,000$ out-of-distribution COCO image probes. (c) ProbeLog representations using $1,000$ in-distribution CIFAR10 image probes. Both find meaningful similarities, although in-distribution probes work better.
  • Figure 8: Text-Aligned ProbeLog Representation. We present a method to create ProbeLog-like representations for text prompts. We encode and store each of our ordered probes using the CLIP image encoder. At inference time, we embed the target text prompt, and compute its similarity with respect to the stored probe representations. We demonstrate that by normalizing this zero-shot ProbeLog descriptor, we can effectively search descriptors of real model logits, accurately retrieving similar concepts.
  • ...and 3 more figures