Table of Contents
Fetching ...

LIFT+: Lightweight Fine-Tuning for Long-Tail Learning

Jiang-Xin Shi, Tong Wei, Yu-Feng Li

TL;DR

It is revealed that heavy fine-tuning can lead to non-negligible performance deterioration on tail classes, whereas lightweight fine-tuning demonstrates superior effectiveness.

Abstract

The fine-tuning paradigm has emerged as a prominent approach for addressing long-tail learning tasks in the era of foundation models. However, the impact of fine-tuning strategies on long-tail learning performance remains unexplored. In this work, we disclose that existing paradigms exhibit a profound misuse of fine-tuning methods, leaving significant room for improvement in both efficiency and accuracy. Specifically, we reveal that heavy fine-tuning (fine-tuning a large proportion of model parameters) can lead to non-negligible performance deterioration on tail classes, whereas lightweight fine-tuning demonstrates superior effectiveness. Through comprehensive theoretical and empirical validation, we identify this phenomenon as stemming from inconsistent class conditional distributions induced by heavy fine-tuning. Building on this insight, we propose LIFT+, an innovative lightweight fine-tuning framework to optimize consistent class conditions. Furthermore, LIFT+ incorporates semantic-aware initialization, minimalist data augmentation, and test-time ensembling to enhance adaptation and generalization of foundation models. Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness. Extensive experiments demonstrate that LIFT+ significantly reduces both training epochs (from $\sim$100 to $\leq$15) and learned parameters (less than 1%), while surpassing state-of-the-art approaches by a considerable margin. The source code is available at https://github.com/shijxcs/LIFT-plus.

LIFT+: Lightweight Fine-Tuning for Long-Tail Learning

TL;DR

It is revealed that heavy fine-tuning can lead to non-negligible performance deterioration on tail classes, whereas lightweight fine-tuning demonstrates superior effectiveness.

Abstract

The fine-tuning paradigm has emerged as a prominent approach for addressing long-tail learning tasks in the era of foundation models. However, the impact of fine-tuning strategies on long-tail learning performance remains unexplored. In this work, we disclose that existing paradigms exhibit a profound misuse of fine-tuning methods, leaving significant room for improvement in both efficiency and accuracy. Specifically, we reveal that heavy fine-tuning (fine-tuning a large proportion of model parameters) can lead to non-negligible performance deterioration on tail classes, whereas lightweight fine-tuning demonstrates superior effectiveness. Through comprehensive theoretical and empirical validation, we identify this phenomenon as stemming from inconsistent class conditional distributions induced by heavy fine-tuning. Building on this insight, we propose LIFT+, an innovative lightweight fine-tuning framework to optimize consistent class conditions. Furthermore, LIFT+ incorporates semantic-aware initialization, minimalist data augmentation, and test-time ensembling to enhance adaptation and generalization of foundation models. Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness. Extensive experiments demonstrate that LIFT+ significantly reduces both training epochs (from 100 to 15) and learned parameters (less than 1%), while surpassing state-of-the-art approaches by a considerable margin. The source code is available at https://github.com/shijxcs/LIFT-plus.

Paper Structure

This paper contains 20 sections, 1 theorem, 10 equations, 16 figures, 19 tables, 2 algorithms.

Key Result

Proposition 3.1

The underestimated class-conditional probability ${\operatorname{P}}(\phi({\bm{x}})\mid y=j)$ leads to an underestimated loss on class $j$ and a biased prediction towards other classes.

Figures (16)

  • Figure 1: (a-b) On ImageNet-LT and Places-LT, zero-shot CLIP has surpassed many prior methods. By simply introducing an additional classifier, the accuracy further increases. However, the improvements mainly come from the head classes, while the tail classes only achieve marginal enhancements. (c) On iNaturalist 2018, zero-shot CLIP encounters challenges in achieving high accuracy for fine-grained long-tail categories.
  • Figure 2: Comparison of different fine-tuning manners. Full fine-tuning improves head-class accuracy but severely decreases tail-class performance, even when employing balanced loss and classifier initialization.
  • Figure 3: Inter-class feature similarities (heatmaps) and intra-class distance distributions from tail classes (histograms) on ImageNet-LT. Classifier fine-tuning limits head-class performance due to high inter-class similarities. Full fine-tuning optimizes inter-class similarities but leads to inconsistent distribution between train and test data on tail classes.
  • Figure 4: Fine-tuning a small proportion of all parameters (e.g., 0.1%-2%) yields superior performance. As the proportion increases, performance deteriorates even when we search for the best learning rate.
  • Figure 5: Inter-class feature similarities (heatmaps) and intra-class distributions from tail classes (histograms) on ImageNet-LT. Both arbitrary and structured lightweight fine-tuning perform well in optimizing inter-class similarities and preserving intra-class distributions.
  • ...and 11 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • proof
  • Remark 3.2