Table of Contents
Fetching ...

Feature Space Perturbation: A Panacea to Enhanced Transferability Estimation

Prafful Kumar Khoba, Zijian Wang, Chetan Arora, Mahsa Baktashmotlagh

TL;DR

This work targets robust model ranking for transfer learning by introducing feature space perturbation (Spread and Attract, SA) to test embedding resilience and improve transferability estimates. SA perturbs intra-class structure via Spread and blurs inter-class boundaries via Attract, producing perturbations $\mathbf{\hat{X}}_{\text{attract}}$ that are fed to a transferability metric $\mathcal{M}$ to yield scores $T_l$ used for ranking, with PCA used to manage dimensionality. Empirically, SA yields substantial gains across vanilla, last-block fine-tuning (LBFT), and linear fine-tuning (LFT), e.g., up to $28.84\%$ improvement on LogMe and notable gains across multiple metrics; for self-supervised models, an LDA-based metric outperforms SOTA by $12.7\%$ (vanilla FT) and $15.06\%$ (LFT). The results highlight robustness as a critical dimension in transferability estimation and suggest promising directions for adaptive metrics that jointly optimize adaptability and robustness across supervised and self-supervised paradigms.

Abstract

Leveraging a transferability estimation metric facilitates the non-trivial challenge of selecting the optimal model for the downstream task from a pool of pre-trained models. Most existing metrics primarily focus on identifying the statistical relationship between feature embeddings and the corresponding labels within the target dataset, but overlook crucial aspect of model robustness. This oversight may limit their effectiveness in accurately ranking pre-trained models. To address this limitation, we introduce a feature perturbation method that enhances the transferability estimation process by systematically altering the feature space. Our method includes a Spread operation that increases intra-class variability, adding complexity within classes, and an Attract operation that minimizes the distances between different classes, thereby blurring the class boundaries. Through extensive experimentation, we demonstrate the efficacy of our feature perturbation method in providing a more precise and robust estimation of model transferability. Notably, the existing LogMe method exhibited a significant improvement, showing a 28.84% increase in performance after applying our feature perturbation method.

Feature Space Perturbation: A Panacea to Enhanced Transferability Estimation

TL;DR

This work targets robust model ranking for transfer learning by introducing feature space perturbation (Spread and Attract, SA) to test embedding resilience and improve transferability estimates. SA perturbs intra-class structure via Spread and blurs inter-class boundaries via Attract, producing perturbations that are fed to a transferability metric to yield scores used for ranking, with PCA used to manage dimensionality. Empirically, SA yields substantial gains across vanilla, last-block fine-tuning (LBFT), and linear fine-tuning (LFT), e.g., up to improvement on LogMe and notable gains across multiple metrics; for self-supervised models, an LDA-based metric outperforms SOTA by (vanilla FT) and (LFT). The results highlight robustness as a critical dimension in transferability estimation and suggest promising directions for adaptive metrics that jointly optimize adaptability and robustness across supervised and self-supervised paradigms.

Abstract

Leveraging a transferability estimation metric facilitates the non-trivial challenge of selecting the optimal model for the downstream task from a pool of pre-trained models. Most existing metrics primarily focus on identifying the statistical relationship between feature embeddings and the corresponding labels within the target dataset, but overlook crucial aspect of model robustness. This oversight may limit their effectiveness in accurately ranking pre-trained models. To address this limitation, we introduce a feature perturbation method that enhances the transferability estimation process by systematically altering the feature space. Our method includes a Spread operation that increases intra-class variability, adding complexity within classes, and an Attract operation that minimizes the distances between different classes, thereby blurring the class boundaries. Through extensive experimentation, we demonstrate the efficacy of our feature perturbation method in providing a more precise and robust estimation of model transferability. Notably, the existing LogMe method exhibited a significant improvement, showing a 28.84% increase in performance after applying our feature perturbation method.

Paper Structure

This paper contains 20 sections, 11 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of our feature perturbation method for transferability estimation. (a) provides a flowchart outlining the process of enhancing transferability estimation, with bold elements representing our perturbation steps. The remaining part of the flowchart illustrates the process traditionally employed in existing transferability estimation work. (b) shows initial embeddings with significant inter-class separation and compact intra-class clustering, typical of supervised models. (c) displays embeddings after our feature perturbation. (d) and (e) present actual correlation charts and weighted Kendall correlation coefficient $\tau_w$ on the pets dataset. Correlation chart depicts the predicted rankings versus actual rankings before and after perturbations, where each symbol in these charts represents a model. The shift from lower to higher correlation values highlights the improved accuracy of model rankings after applying our perturbation method.
  • Figure 2: Demonstrating the importance of controlled perturbation in feature space manipulation, using a toy example. (a) Represents the initial target embedding. (b) depict an appropriate amount of feature perturbation, while (c) demonstrate excessive levels of feature perturbation.
  • Figure 3: Visualization of ResNet50 target embeddings before feature perturbation (best viewed in color): (a) Represents datasets exhibiting mixed improvement for various metrics, shown in Table \ref{['tab:sup_fine-tune']}. The presence of class overlap in (a) contributes to the varied performance across metrics. In contrast, (b) depicts datasets demonstrating consistent improvement across all other evaluated metrics, facilitated by well-separated embeddings. This distinction underscores the role of embedding structure in the estimation.
  • Figure 4: This figure demonstrates a bar chart that illustrates the performance improvement of various operations of feature perturbation over the original baseline. Each metric is represented by four bars, corresponding to different operations: Original, Spread, Attract, and Combined Spread-Attract, illustrating that the combined approach significantly outperform others.
  • Figure 5: Hyper-parameter sensitivity analysis: The left figure showcases consistent performance ($\tau_w$) across a wide range of $\alpha$ values, at optimum $\sigma$. This observation indicates an insensitivity to hyper-parameter changes. On the other hand, the right figure illustrates limited variance in performance ($\tau_w$) across a broad spectrum of $\sigma$ values, ranging from 0.5 to 0.9, at optimum $\alpha$.
  • ...and 3 more figures