Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

Wenxiao Wang; Weiming Zhuang; Lingjuan Lyu

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

Wenxiao Wang, Weiming Zhuang, Lingjuan Lyu

TL;DR

This work defines isolated model embedding, a family of model selection schemes supporting asymptotically fast update and selection, and presents Standardized Embedder, an empirical realization of isolated model embedding.

Abstract

The advancement of deep learning technologies is bringing new models every day, motivating the study of scalable model selection. An ideal model selection scheme should minimally support two operations efficiently over a large pool of candidate models: update, which involves either adding a new candidate model or removing an existing candidate model, and selection, which involves locating highly performing models for a given task. However, previous solutions to model selection require high computational complexity for at least one of these two operations. In this work, we target fundamentally (more) scalable model selection that supports asymptotically fast update and asymptotically fast selection at the same time. Firstly, we define isolated model embedding, a family of model selection schemes supporting asymptotically fast update and selection: With respect to the number of candidate models $m$, the update complexity is O(1) and the selection consists of a single sweep over $m$ vectors in addition to O(1) model operations. Isolated model embedding also implies several desirable properties for applications. Secondly, we present Standardized Embedder, an empirical realization of isolated model embedding. We assess its effectiveness by using it to select representations from a pool of 100 pre-trained vision models for classification tasks and measuring the performance gaps between the selected models and the best candidates with a linear probing protocol. Experiments suggest our realization is effective in selecting models with competitive performances and highlight isolated model embedding as a promising direction towards model selection that is fundamentally (more) scalable.

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

TL;DR

Abstract

, the update complexity is O(1) and the selection consists of a single sweep over

vectors in addition to O(1) model operations. Isolated model embedding also implies several desirable properties for applications. Secondly, we present Standardized Embedder, an empirical realization of isolated model embedding. We assess its effectiveness by using it to select representations from a pool of 100 pre-trained vision models for classification tasks and measuring the performance gaps between the selected models and the best candidates with a linear probing protocol. Experiments suggest our realization is effective in selecting models with competitive performances and highlight isolated model embedding as a promising direction towards model selection that is fundamentally (more) scalable.

Paper Structure (21 sections, 3 equations, 8 figures, 3 tables)

This paper contains 21 sections, 3 equations, 8 figures, 3 tables.

Introduction
Related Work
A Family of Model Selection with Asymptotically Fast Update and Selection
Formal Definition
Asymptotically Fast Update and Selection
Other Desirable Properties
Standardized Embedder: A Realization of Isolated Model Embedding
Tool: (Approximate) Functionality Equivalence
Preprocessing: Isolated Model Embedding by Identifying Equivalent Feature Subsets
Selection: Task Embedding through Feature Sifting
Evaluation
Evaluation Setup
The Performance of Standardized Embedder
On the Choice of Baseline Features
Choosing Sparsity Level in Task Embedding
...and 6 more sections

Figures (8)

Figure 1: Illustrations for different families of model selection schemes. Isolated model embedding (ours) is a family that supports asymptotically fast update and selection at the same time.
Figure 2: An illustration of Standardized Embedder. (a) Preprocessing: Using features of a public model as the baseline, a vector embedding is learned independently for each pre-trained model. Intuitively, the embeddings of models denote their approximately equivalent feature subsets in the baseline features. (b) Selection: Task embeddings are defined by subsets of the baseline features that are important to corresponding downstream tasks, which are identified through enforcing sparsity regularization. Models are selected by comparing the task embedding with model embeddings of all candidates, using (the cardinality of) standard fuzzy set intersection as the selection metric.
Figure 3: (a, b) Downstream accuracy (i.e. the ground truth) on CIFAR-10 v.s. the cardinality of standard intersections (i.e. the selection metric) when using 4k steps per candidate. The downstream accuracy of the baseline features are highlighted with the dashed line. When a public model is only suboptimal, using it as baseline features for Standardized Embedder can still locate more competitive models. See Figure \ref{['fig:acc_vs_si']} and \ref{['fig:acc_vs_si_10k']} in Appendix for more results including other downstreams and more steps.(c) Comparing the cardinality of standard intersections (i.e. the selection metric) when using different baseline features (ResNet-18 and Swin-T (tiny)) with 4k steps per candidate and CIFAR-10 as the downstream task. The green/orange points in the bottom right suggest using ResNet-18 as baseline features tend to overestimate (some) models with attentions compared to using Swin Transformer (tiny). See Figure \ref{['fig:si_vs_si_4k']} and \ref{['fig:si_vs_si_10k']} in Appendix for more results including other downstreams and more steps.(d) Downstream accuracy on CIFAR-10 of the baseline features ResNet-18 corresponding to varying sparsity regularization $\gamma$. A rule of thumb for deciding the value of $\gamma$: using the smallest $\gamma$ with at least 3% accuracy drop from the converged accuracy. See Figure \ref{['fig:choice_of_L1reg']} in Appendix for more results.
Figure 4: Downstream accuracy (i.e. the ground truth) v.s. the cardinality of standard intersections (i.e. the selection metric) when using 4k steps per candidate. The downstream accuracy of the baseline features are highlighted with the dashed line. When a public model is only suboptimal, using it as baseline features for Standardized Embedder can still locate more competitive models.
Figure 5: Downstream accuracy (i.e. the ground truth) v.s. the cardinality of standard intersections (i.e. the selection metric) when using 10k steps per candidate. The downstream accuracy of the baseline features are highlighted with the dashed line. When a public model is only suboptimal, using it as baseline features for Standardized Embedder can still locate more competitive models.
...and 3 more figures

Theorems & Definitions (2)

Definition 4.1: Functionality Equivalence
Definition 4.2: Vector Embedding through Equivalent Feature Subsets

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

TL;DR

Abstract

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (2)