Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Zhuoyan Xu; Zhenmei Shi; Junyi Wei; Fangzhou Mu; Yin Li; Yingyu Liang

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

TL;DR

This work analyzes how multitask finetuning of foundation models can improve adaptation to new tasks with limited labels. It introduces a theoretical framework that links downstream generalization to the diversity of finetuning tasks and their consistency with the target task, providing an explicit bound that motivates task selection. A practical greedy task-selection algorithm is proposed, supported by a linear-case analysis showing how diversity and consistency correspond to feature coverage and alignment. Empirically, multitask finetuning yields consistent gains across vision and language benchmarks, with larger improvements when auxiliary data are diverse yet aligned with the target, and when target-domain similarities are exploited. The findings offer a principled path to better few-shot adaptation in real-world settings and include open-source code for reproducibility.

Abstract

Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples. In this paper, we study the theoretical justification of this multitask finetuning approach. Our theoretical analysis reveals that with a diverse set of related tasks, this multitask finetuning leads to reduced error in the target task, in comparison to directly adapting the same pretrained model. We quantify the relationship between finetuning tasks and target tasks by diversity and consistency metrics, and further propose a practical task selection algorithm. We substantiate our theoretical claims with extensive empirical evidence. Further, we present results affirming our task selection algorithm adeptly chooses related finetuning tasks, providing advantages to the model performance on target tasks. We believe our study shed new light on the effective adaptation of foundation models to new tasks that lack abundant labels. Our code is available at https://github.com/OliverXUZY/Foudation-Model_Multitask.

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

TL;DR

Abstract

Paper Structure (80 sections, 21 theorems, 95 equations, 7 figures, 22 tables, 2 algorithms)

This paper contains 80 sections, 21 theorems, 95 equations, 7 figures, 22 tables, 2 algorithms.

Introduction
Related Work
Background: Multitask Finetuning for Few-Shot Learning
Theoretical Analysis: Benefit of Multitask Finetuning
Case Study of Diversity and Consistency
Task Selection
Experiments
Verification of Theoretical Analysis
Task Selection
Effectiveness of Multitask Finetuning
Conclusions
Limitation
More Related Work
Training Foundation Models.
Adapting Foundation Models.
...and 65 more sections

Key Result

Lemma C.1

For $\forall \phi \in \Phi$ pretrained in contrastive loss, we have $\mathcal{L}_{sup}(\phi) \le \frac{1}{1-\tau} (\mathcal{L}_{con-pre}(\phi) - \tau)$.

Figures (7)

Figure 1: Illustration of features in linear data. Blue are the features encoded in $\mathcal{C}$ while red is not.
Figure 2: Illustration of the similarity and coverage. Target tasks ($\mathcal{T}_0$) with the most similar tasks in yellow and the rest in blue. The ellipsoid spanned by yellow tasks is the coverage for the target task. Adding more tasks in blue to the ellipsoid does not increase the coverage boundary.
Figure 3: Results on ViT-B backbone pretrained by MoCo v3. (a) Accuracy v.s. number of shots per finetuning task. Different curves correspond to different total numbers of samples $Mm$. (b) Accuracy v.s. the number of tasks $M$. Different curves correspond to different numbers of samples per task $m$. (c) Accuracy v.s. number of samples per task $m$. Different curves correspond to different numbers of tasks $M$.
Figure 4: Dataset selection based on consistency and diversity on domainNet. \ref{['app:fig:domain_meanEmbedding']} shows the consistency. \ref{['app:fig:domain_ellipsoid']} shows the diversity.
Figure 5: Finetuning with different selection of domain datasets, where rp: real and painting; rps: real and painting and sketch and so on.
...and 2 more figures

Theorems & Definitions (50)

Definition 1: Diversity
Definition 2: Consistency
Remark 3.1
Lemma C.1: Lemma 4.3 in arora2019theoretical
Theorem C.2
proof : Proof of \ref{['thm:con-pre_to_target']}
Definition 3
Definition 4
Definition 5
Lemma C.3: Bounded Rademacher complexity
...and 40 more

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

TL;DR

Abstract

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (50)