Table of Contents
Fetching ...

Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks

Yibo Zhong, Yao Zhou

TL;DR

The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning and yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.

Abstract

Low-rank adaptation (LoRA) has shifted the paradigm of adapting pre-trained Vision Transformers (ViT), achieving great efficiency by updating only a subset of tailored parameters to approximate weight updates. However, the multi-head design of the self-attention mechanism, with the heads working in parallel in the computation flow, exhibiting similar visual patterns and requiring update over all of them, incurs unnecessary storage and computational overhead. In this paper, we propose Head-level responsiveness tuning for low-rank adaptation (Heart-LoRA). The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning. Additionally, given the different responsiveness of heads to diverse visual tasks, our proposed method dynamically activates a subset of the approximated heads that are tailored to the current task. Experimental results show that Heart-LoRA yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.

Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks

TL;DR

The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning and yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.

Abstract

Low-rank adaptation (LoRA) has shifted the paradigm of adapting pre-trained Vision Transformers (ViT), achieving great efficiency by updating only a subset of tailored parameters to approximate weight updates. However, the multi-head design of the self-attention mechanism, with the heads working in parallel in the computation flow, exhibiting similar visual patterns and requiring update over all of them, incurs unnecessary storage and computational overhead. In this paper, we propose Head-level responsiveness tuning for low-rank adaptation (Heart-LoRA). The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning. Additionally, given the different responsiveness of heads to diverse visual tasks, our proposed method dynamically activates a subset of the approximated heads that are tailored to the current task. Experimental results show that Heart-LoRA yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.
Paper Structure (33 sections, 6 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 33 sections, 6 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: An example of recognising the responsiveness of the heads and then toggling the set of active heads across diverse tasks with different characteristics. Here we choose the example image for natural task and specialized task from VTAB-1K benchmark. The example for natural images assumes head 1 and 4 have the 2 least (since $ne=2$ here) responsiveness, while for specialized images those are head 2 and 3. Once recognized, these heads are switched off during the adaptation process to eliminate their influence. Therefore the set of active heads are $\{2, 3\}$ and $\{1,4\}$, respectively.
  • Figure 2: Full results on few-shot learning. Compared methods include NOAH, Bi-LoRA, VPT, LoRA, and Adapter. All experiments are conducted under a setting with shots ranging from 1 to 16. The results for each shot are averages over three distinct seeds.
  • Figure 3: To test the effect of arbitrary deactivation, $ne$ of 1, 3, and 6 were applied on Heart-LoRA, and comparisons were made with the baseline without any deactivation. $ne=x$ means that in the Heart-LoRA, there are $x$ heads at the front being deactivated while $ne=0$ serves as the baseline. Values in the figure represent the ratio of results after applying deactivation to the baseline results.
  • Figure 4: Impact of $ne$ on performance on CIFAR-100 and dSpr-Loc datasets.
  • Figure 5: Comparison of attention map between Heart-LoRA and Bi-LoRA. For the example here the first image is taken from Caltech101 dataset while the second from Flowers102 dataset. We use $ne=3$ and $ne=6$, respectively for Hear-LoRA. It can be observed that the attention maps are nearly the same after certain deactivation, demonstrating the existence of head redundancy.
  • ...and 3 more figures