Table of Contents
Fetching ...

Bayesian-guided Label Mapping for Visual Reprogramming

Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

TL;DR

It is revealed that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels and a Bayesian-guided Label Mapping (BLM) method is proposed, which offers a probabilistic lens through which to understand and analyze the effectiveness of VR.

Abstract

Visual reprogramming (VR) leverages the intrinsic capabilities of pretrained vision models by adapting their input or output interfaces to solve downstream tasks whose labels (i.e., downstream labels) might be totally different from the labels associated with the pretrained models (i.e., pretrained labels). When adapting the output interface, label mapping methods transform the pretrained labels to downstream labels by establishing a gradient-free one-to-one correspondence between the two sets of labels. However, in this paper, we reveal that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels. Motivated by this observation, we propose a Bayesian-guided Label Mapping (BLM) method. BLM constructs an iteratively-updated probabilistic label mapping matrix, with each element quantifying a pairwise relationship between pretrained and downstream labels. The assignment of values to the constructed matrix is guided by Bayesian conditional probability, considering the joint distribution of the downstream labels and the labels predicted by the pretrained model on downstream samples. Experiments conducted on both pretrained vision models (e.g., ResNeXt) and vision-language models (e.g., CLIP) demonstrate the superior performance of BLM over existing label mapping methods. The success of BLM also offers a probabilistic lens through which to understand and analyze the effectiveness of VR. Our code is available at https://github.com/tmlr-group/BayesianLM.

Bayesian-guided Label Mapping for Visual Reprogramming

TL;DR

It is revealed that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels and a Bayesian-guided Label Mapping (BLM) method is proposed, which offers a probabilistic lens through which to understand and analyze the effectiveness of VR.

Abstract

Visual reprogramming (VR) leverages the intrinsic capabilities of pretrained vision models by adapting their input or output interfaces to solve downstream tasks whose labels (i.e., downstream labels) might be totally different from the labels associated with the pretrained models (i.e., pretrained labels). When adapting the output interface, label mapping methods transform the pretrained labels to downstream labels by establishing a gradient-free one-to-one correspondence between the two sets of labels. However, in this paper, we reveal that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels. Motivated by this observation, we propose a Bayesian-guided Label Mapping (BLM) method. BLM constructs an iteratively-updated probabilistic label mapping matrix, with each element quantifying a pairwise relationship between pretrained and downstream labels. The assignment of values to the constructed matrix is guided by Bayesian conditional probability, considering the joint distribution of the downstream labels and the labels predicted by the pretrained model on downstream samples. Experiments conducted on both pretrained vision models (e.g., ResNeXt) and vision-language models (e.g., CLIP) demonstrate the superior performance of BLM over existing label mapping methods. The success of BLM also offers a probabilistic lens through which to understand and analyze the effectiveness of VR. Our code is available at https://github.com/tmlr-group/BayesianLM.

Paper Structure

This paper contains 36 sections, 5 theorems, 27 equations, 22 figures, 9 tables, 9 algorithms.

Key Result

Lemma E.3

Given a collection of paired labels $\{ (y^{\rm S}, y^{\rm T}) \}_{i=1}^{n}$. If the aggregate conditional probabilities $p(y^{\rm S} = 1 | y^{\rm T} = 0) \geq p(y^{\rm S} = 0 | y^{\rm T} = 0)$ and $p(y^{\rm S} = 0 | y^{\rm T} = 1) \geq p(y^{\rm S} = 1 | y^{\rm T} = 1)$ hold true, and considering $f

Figures (22)

  • Figure 1: Drawbacks of one-to-one LM from the perspectives of (a) individual images and (b) the entire dataset. An ImageNet-pretrained classifier is reused in downstream tasks. In (a), images 'Dog' and 'Osteospermum' from downstream tasks are mapped into only one pretrained label, respectively, ignoring other probabilities. In (b), the distribution of [predicted pretrained label $y^{\rm S}$, ground-truth downstream label $y^{\rm T}$] pairs reveals the existence of suboptimal solutions, where 'Automobile' cannot be paired with the optimal pretrained label 'Moving Van', which has already been mapped to 'Truck'.
  • Figure 2: Learning strategy of BLM and BLM+. First, input images, incorporated with VR watermarking or padding patterns, are fed into a fixed pretrained model to obtain logits and predicted labels. Then, the true labels (of $y^{\rm T}$) and predicted labels (of $y^{\rm S}$) are used to estimate $\omega_{\rm BLM}$ or $\omega_{\rm BLM_{\rm +}}$. Next, using $\omega_{\rm BLM}$ or $\omega_{\rm BLM_{\rm +}}$ that reweights output logits of pretrained models for the downstream labels, the predicted results can be derived. Finally, backpropagation is performed to update the input VR.
  • Figure 3: Visualization results of top weighted pretrained labels $y^{\rm S}$ and weights $\omega_{y^{\rm S},y^{\rm T}}$ for some $y^{\rm T}$ applying BLM and BLM+. Downstream labels 'Edamame', 'Fibrous', and 'Dog' are shown as examples. ResNet-18 pretrained on ImageNet is used. More results are in Appendix \ref{['app:vis']}.
  • Figure 4: Visualization of input VR and top-weighted pretrained labels applying BLM+. Training loss and weight changes (Euclidean norm) of probabilistic LM $\omega_{\rm BLM+}$ per iteration are plotted below. Pretrained ResNet-18 is used, and the downstream label 'Marigold' is selected as an example.
  • Figure 5: Accuracy improvement (%) of BLM and BLM+ compared with ILM given different sizes ($k_{\rm T}$) of the downstream label space, using pretrained ResNet-18.
  • ...and 17 more figures

Theorems & Definitions (10)

  • Definition E.1: probabilistic label mapping (PLM)
  • Definition E.2: deterministic label mapping (DLM)
  • Lemma E.3
  • Lemma E.4
  • Corollary E.5
  • Remark E.6
  • Lemma E.7: cf. Lemma \ref{['lemma:identity']}
  • proof
  • Lemma E.8: cf. Lemma \ref{['lemma:flip']}
  • proof