Table of Contents
Fetching ...

One Step Learning, One Step Review

Xiaolong Huang, Qiankun Li, Xueran Li, Xuesong Gao

TL;DR

OLOR addresses knowledge forgetting during fine-tuning of large pre-trained vision models by coupling a weight rollback term with standard optimizers and a layer-wise penalty that adjusts rollback by layer depth. The weight rollback anchors updates toward the pre-trained state, with a per-step formulation that converges toward the upstream weights as the rollback strength increases, while the layer-wise penalty decays rollback across layers and introduces a diversified rate to adapt to task similarity. Across ten downstream tasks and multiple backbones, OLOR achieves state-of-the-art results, demonstrates robust transfer across pre-training sources, and remains compatible with Adam and SGD; ablations and forgetting analyses further substantiate its effectiveness. The work provides an efficient, general fine-tuning framework with practical impact for leveraging large pre-trained vision models in diverse applications, with code released to facilitate adoption.

Abstract

Visual fine-tuning has garnered significant attention with the rise of pre-trained vision models. The current prevailing method, full fine-tuning, suffers from the issue of knowledge forgetting as it focuses solely on fitting the downstream training set. In this paper, we propose a novel weight rollback-based fine-tuning method called OLOR (One step Learning, One step Review). OLOR combines fine-tuning with optimizers, incorporating a weight rollback term into the weight update term at each step. This ensures consistency in the weight range of upstream and downstream models, effectively mitigating knowledge forgetting and enhancing fine-tuning performance. In addition, a layer-wise penalty is presented to employ penalty decay and the diversified decay rate to adjust the weight rollback levels of layers for adapting varying downstream tasks. Through extensive experiments on various tasks such as image classification, object detection, semantic segmentation, and instance segmentation, we demonstrate the general applicability and state-of-the-art performance of our proposed OLOR. Code is available at https://github.com/rainbow-xiao/OLOR-AAAI-2024.

One Step Learning, One Step Review

TL;DR

OLOR addresses knowledge forgetting during fine-tuning of large pre-trained vision models by coupling a weight rollback term with standard optimizers and a layer-wise penalty that adjusts rollback by layer depth. The weight rollback anchors updates toward the pre-trained state, with a per-step formulation that converges toward the upstream weights as the rollback strength increases, while the layer-wise penalty decays rollback across layers and introduces a diversified rate to adapt to task similarity. Across ten downstream tasks and multiple backbones, OLOR achieves state-of-the-art results, demonstrates robust transfer across pre-training sources, and remains compatible with Adam and SGD; ablations and forgetting analyses further substantiate its effectiveness. The work provides an efficient, general fine-tuning framework with practical impact for leveraging large pre-trained vision models in diverse applications, with code released to facilitate adoption.

Abstract

Visual fine-tuning has garnered significant attention with the rise of pre-trained vision models. The current prevailing method, full fine-tuning, suffers from the issue of knowledge forgetting as it focuses solely on fitting the downstream training set. In this paper, we propose a novel weight rollback-based fine-tuning method called OLOR (One step Learning, One step Review). OLOR combines fine-tuning with optimizers, incorporating a weight rollback term into the weight update term at each step. This ensures consistency in the weight range of upstream and downstream models, effectively mitigating knowledge forgetting and enhancing fine-tuning performance. In addition, a layer-wise penalty is presented to employ penalty decay and the diversified decay rate to adjust the weight rollback levels of layers for adapting varying downstream tasks. Through extensive experiments on various tasks such as image classification, object detection, semantic segmentation, and instance segmentation, we demonstrate the general applicability and state-of-the-art performance of our proposed OLOR. Code is available at https://github.com/rainbow-xiao/OLOR-AAAI-2024.
Paper Structure (29 sections, 10 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 29 sections, 10 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of OLOR using Adam as optimizer, where $\lambda_i$ represents the penalty factor of $i_{th}$ layer, $\theta_t$ and $\hat{\theta}_{t+1}$ represents the weight and the estimation of next weight (pre-weight) at timestep $t$, respectively. The transparency of the image indicates the knowledge forgetting level.
  • Figure 2: Train loss and valid top1 accuracy on CIfar-100, using ViT-B with Adam and ConvNext-B with SGD.
  • Figure 3: Knowledge forgetting test on PACS. Fold 1 as train set and fold 2 as valid set during pre-training, splits during fine-tuning is opposite to pre-training.
  • Figure 4: Hyper-parameters exploring experiments on Cifar-100(left) and PACS(right), both using ViT-B with Adam.
  • Figure 5: Feature visualization on PACS test set. We use features extracted by backbone to perform t-SNE visualization, and the Top1-accuracy are reported additionally.