Table of Contents
Fetching ...

In Search of the Successful Interpolation: On the Role of Sharpness in CLIP Generalization

Alireza Abdollahpoorrostam

TL;DR

This work empirically investigates the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation, and demonstrates that the role of sharpness in the success of interpolation in the weight space of CLIP foundation models is studied.

Abstract

\textit{Zero-shot} models like CLIP are often fine-tuned on a target dataset to improve its accuracy further, but this can compromise out-of-distribution (OOD) robustness. Robust Fine-Tuning (\texttt{RFT} )~\citep{wortsman2021robust}, which interpolates between the \textit{zero-shot} and \textit{fine-tuned} models, has been proposed to address this issue. However, understanding when \texttt{RFT} actually improves OOD error remains limited. In this work, we empirically investigate the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation. First, we demonstrate that while sharpness may not serve as a reliable indicator for predicting the generalization of modern architectures like CLIP on OOD data, this challenges the conventional belief in the generalization benefits of flat minima in foundation models. However, by examining the role of the \textit{straggler layer} phenomenon, we show that, unlike overall sharpness, the \textit{layer-wise} sharpness of \textit{straggler} layers can reliably capture the generalization performance of interpolated CLIP models on OOD data. Our extensive experiments reveal that \textit{layer-wise} sharpness correlates with generalization in OOD accuracy for \texttt{RFT}. Furthermore, we demonstrate that by inducing sparsity in the \textit{straggler} layers, we can mitigate the \textit{failure mode} phenomenon in \texttt{RFT}. To the best of our knowledge, this is the first work to study the role of sharpness in the \textit{success} of interpolation in the weight space of CLIP foundation models. Our code is available at \url{https://github.com/alirezaabdollahpour/CLIP_Mode_Connectivity}.

In Search of the Successful Interpolation: On the Role of Sharpness in CLIP Generalization

TL;DR

This work empirically investigates the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation, and demonstrates that the role of sharpness in the success of interpolation in the weight space of CLIP foundation models is studied.

Abstract

\textit{Zero-shot} models like CLIP are often fine-tuned on a target dataset to improve its accuracy further, but this can compromise out-of-distribution (OOD) robustness. Robust Fine-Tuning (\texttt{RFT} )~\citep{wortsman2021robust}, which interpolates between the \textit{zero-shot} and \textit{fine-tuned} models, has been proposed to address this issue. However, understanding when \texttt{RFT} actually improves OOD error remains limited. In this work, we empirically investigate the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation. First, we demonstrate that while sharpness may not serve as a reliable indicator for predicting the generalization of modern architectures like CLIP on OOD data, this challenges the conventional belief in the generalization benefits of flat minima in foundation models. However, by examining the role of the \textit{straggler layer} phenomenon, we show that, unlike overall sharpness, the \textit{layer-wise} sharpness of \textit{straggler} layers can reliably capture the generalization performance of interpolated CLIP models on OOD data. Our extensive experiments reveal that \textit{layer-wise} sharpness correlates with generalization in OOD accuracy for \texttt{RFT}. Furthermore, we demonstrate that by inducing sparsity in the \textit{straggler} layers, we can mitigate the \textit{failure mode} phenomenon in \texttt{RFT}. To the best of our knowledge, this is the first work to study the role of sharpness in the \textit{success} of interpolation in the weight space of CLIP foundation models. Our code is available at \url{https://github.com/alirezaabdollahpour/CLIP_Mode_Connectivity}.

Paper Structure

This paper contains 8 sections, 1 theorem, 9 equations, 5 figures, 1 algorithm.

Key Result

Proposition A.1

(andriushchenko2023modernlookrelationshipsharpness), Let $L_\mathcal{S}\in C^3(\mathbb{R}^s)$, $S$ be a finite sample of points $(x_i,y_i)_{i=1}^n$ and let $P_m$ denote the uniform distribution over subsamples of size $m\leq n$ from $S$. Then

Figures (5)

  • Figure 1: For 9 distinct fine-tuned CLIP models (each color shows different CLIP models) on ImageNet deng2009imagenet, this plot demonstrates the accuracy and loss on ImageNet-A hendrycks2021natural as an OOD task. For each model, we show the maximum accuracy gain achieved along a corresponding interpolation path. In the loss plot, we show depth as the largest barrier on the interpolation path starting from the zero-shot model.
  • Figure 2: Layer-wise interpolation on ImageNet-A as OOD. For two distinct fine-tuned CLIP models one exhibiting failure mode and the other high gain accuracy in regular interpolation (RFT), we conduct a layer-wise interpolation alongside each layer with the zero-shot CLIP model.
  • Figure 3: For 9 distinct fine-tuned CLIP models (each color shows different CLIP models) on ImageNet, this plot demonstrates the general adaptive average sharpness with $\rho=1.0$ and $20$ iterations on ImageNet-A as an OOD task.
  • Figure 4: We present an analysis of the layer-wise sharpness across four distinct CLIP models, comprising two failure mode models and two high gain accuracy models, demonstrating the sharpness characteristics of each individual layer.
  • Figure 5: Straggler layer pruning. For two distinct fine-tuned CLIP models that exhibit failure mode during interpolation using the RFT algorithm, we demonstrate that pruning the straggler layers of the fine-tuned model prevents a collapse in performance.

Theorems & Definitions (4)

  • Definition 1.1
  • Definition 1.2
  • Proposition A.1
  • proof