Table of Contents
Fetching ...

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Mengke Li, Ye Liu, Yang Lu, Yiqun Zhang, Yiu-ming Cheung, Hui Huang

TL;DR

The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes, and the deferred re-weight scheme to increase the significance of tail-class samples is employed.

Abstract

Long-tail learning has garnered widespread attention and achieved significant progress in recent times. However, even with pre-trained prior knowledge, models still exhibit weaker generalization performance on tail classes. The promising Sharpness-Aware Minimization (SAM) can effectively improve the generalization capability of models by seeking out flat minima in the loss landscape, which, however, comes at the cost of doubling the computational time. Since the update rule of SAM necessitates two consecutive (non-parallelizable) forward and backpropagation at each step. To address this issue, we propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization, requiring only one-step gradient computation at each step. Specifically, we search for the gradient descent direction within a random neighborhood of the parameters during each gradient update. To amplify the impact of tail-class samples and avoid overfitting, we employ the deferred re-weight scheme to increase the significance of tail-class samples. The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes. RSAM-PT achieves the state-of-the-art performance of 90.3\%, 76.5\%, and 50.1\% on benchmark datasets CIFAR100-LT (IF 100), iNaturalist 2018, and Places-LT, respectively. The source code is temporarily available at https://github.com/Keke921/GNM-PT.

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

TL;DR

The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes, and the deferred re-weight scheme to increase the significance of tail-class samples is employed.

Abstract

Long-tail learning has garnered widespread attention and achieved significant progress in recent times. However, even with pre-trained prior knowledge, models still exhibit weaker generalization performance on tail classes. The promising Sharpness-Aware Minimization (SAM) can effectively improve the generalization capability of models by seeking out flat minima in the loss landscape, which, however, comes at the cost of doubling the computational time. Since the update rule of SAM necessitates two consecutive (non-parallelizable) forward and backpropagation at each step. To address this issue, we propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization, requiring only one-step gradient computation at each step. Specifically, we search for the gradient descent direction within a random neighborhood of the parameters during each gradient update. To amplify the impact of tail-class samples and avoid overfitting, we employ the deferred re-weight scheme to increase the significance of tail-class samples. The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes. RSAM-PT achieves the state-of-the-art performance of 90.3\%, 76.5\%, and 50.1\% on benchmark datasets CIFAR100-LT (IF 100), iNaturalist 2018, and Places-LT, respectively. The source code is temporarily available at https://github.com/Keke921/GNM-PT.

Paper Structure

This paper contains 29 sections, 1 theorem, 16 equations, 10 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

For any $0<\delta<1$, and number of samples $n\in \mathbb{N}^{+}$, with probability $1-\delta$ over the training set $\mathcal{T}$ sampled i.i.d. from a distribution $\mathcal{D}$, the following generalization bound w.r.t. model parameters $\boldsymbol{\theta}$ holds: where $h:\mathbb{R}^+\rightarrow \mathbb{R}^+$ is a strictly increasing function.

Figures (10)

  • Figure 1: Loss landscape comparison of VPT based on ViT-B/16 with CE loss (best view in color). The dataset used is CIFAR100-LT with an imbalance ratio of 100.
  • Figure 2: Schematic of optimization direction in GNM. $\boldsymbol{\theta}^{Ori}_{t+1}$ and $\boldsymbol{\theta}^{GNM}_{t+1}$ represent the gradient update w.o. and w. GNM for step $t+1$.
  • Figure 3: Effectiveness comparison of different classes.
  • Figure 4: GCL loss landscapes based on ViT-B/16 (best view in color).
  • Figure 5: Comparison of optimization directions. $\boldsymbol{\theta}^{Ori}_{t+1}$, $\boldsymbol{\theta}^{SAM}_{t+1}$ and $\boldsymbol{\theta}^{GNM}_{t+1}$ represent the original gradient update, gradient update with SAM and with GNM for step $t+1$, respectively.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Remark 1
  • proof
  • Remark 2
  • proof
  • Remark 3
  • Theorem 1
  • proof
  • Remark 4
  • proof
  • proof