Table of Contents
Fetching ...

Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models

Shufan Shen, Junshu Sun, Shuhui Wang, Qingming Huang

TL;DR

SNELLA tackles the memory bottleneck of sparse fine-tuning by unifying end-to-end weight selection and updating in a single stage. It kernelizes LoRA by merging two low-rank matrices with nonlinear kernels to form a high-rank adaptation ΔW, enabling expressive yet memory-efficient updates. An adaptive bi-level sparsity mechanism allocates tunable weights across layers and within layers based on smoothed sensitivity and uncertainty, yielding task-relevant weight selection without gradient-masking masks. Across classification, segmentation, and generation benchmarks, SNELLA achieves state-of-the-art results with substantial memory savings, demonstrating strong generalization across model scales and pre-training strategies, and highlighting the potential for scalable, interpretable PEFT in vision and beyond.

Abstract

Parameter-efficient fine-tuning (PEFT) aims to adapt pre-trained vision models to downstream tasks. Among PEFT paradigms, sparse tuning achieves remarkable performance by adjusting only the weights most relevant to downstream tasks, rather than densely tuning the entire weight matrix. Current methods follow a two-stage paradigm. First, it locates task-relevant weights by gradient information, which overlooks the parameter adjustments during fine-tuning and limits the performance. Second, it updates only the located weights by applying a sparse mask to the gradient of the weight matrix, which results in high memory usage due to the storage of all weight matrices in the optimizer. In this paper, we propose a one-stage method named SNELLA to overcome the above limitations. For memory usage, SNELLA selectively updates the weight matrix by adding it to another sparse matrix that is merged by two low-rank learnable matrices. We extend the low-rank decomposition by introducing nonlinear kernel functions, thereby increasing the rank of the resulting merged matrix to prevent the interdependency among weight updates, enabling better adaptation to downstream tasks. For locating task-relevant weights, we propose an adaptive bi-level sparsity allocation mechanism that encourages weights to compete across and inside layers based on their importance scores in an end-to-end manner. Extensive experiments are conducted on classification, segmentation, and generation tasks using different pre-trained vision models. The results show that SNELLA achieves SOTA performance with low memory usage. Notably, SNELLA obtains 1.8% (91.9% v.s. 90.1%) higher Top-1 accuracy on the FGVC benchmark compared to SPT-LoRA. Compared to previous methods, SNELLA achieves a memory reduction of 31.1%-39.9% across models with parameter scales from 86M to 632M. Our source codes are available at https://github.com/ssfgunner/SNELL.

Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models

TL;DR

SNELLA tackles the memory bottleneck of sparse fine-tuning by unifying end-to-end weight selection and updating in a single stage. It kernelizes LoRA by merging two low-rank matrices with nonlinear kernels to form a high-rank adaptation ΔW, enabling expressive yet memory-efficient updates. An adaptive bi-level sparsity mechanism allocates tunable weights across layers and within layers based on smoothed sensitivity and uncertainty, yielding task-relevant weight selection without gradient-masking masks. Across classification, segmentation, and generation benchmarks, SNELLA achieves state-of-the-art results with substantial memory savings, demonstrating strong generalization across model scales and pre-training strategies, and highlighting the potential for scalable, interpretable PEFT in vision and beyond.

Abstract

Parameter-efficient fine-tuning (PEFT) aims to adapt pre-trained vision models to downstream tasks. Among PEFT paradigms, sparse tuning achieves remarkable performance by adjusting only the weights most relevant to downstream tasks, rather than densely tuning the entire weight matrix. Current methods follow a two-stage paradigm. First, it locates task-relevant weights by gradient information, which overlooks the parameter adjustments during fine-tuning and limits the performance. Second, it updates only the located weights by applying a sparse mask to the gradient of the weight matrix, which results in high memory usage due to the storage of all weight matrices in the optimizer. In this paper, we propose a one-stage method named SNELLA to overcome the above limitations. For memory usage, SNELLA selectively updates the weight matrix by adding it to another sparse matrix that is merged by two low-rank learnable matrices. We extend the low-rank decomposition by introducing nonlinear kernel functions, thereby increasing the rank of the resulting merged matrix to prevent the interdependency among weight updates, enabling better adaptation to downstream tasks. For locating task-relevant weights, we propose an adaptive bi-level sparsity allocation mechanism that encourages weights to compete across and inside layers based on their importance scores in an end-to-end manner. Extensive experiments are conducted on classification, segmentation, and generation tasks using different pre-trained vision models. The results show that SNELLA achieves SOTA performance with low memory usage. Notably, SNELLA obtains 1.8% (91.9% v.s. 90.1%) higher Top-1 accuracy on the FGVC benchmark compared to SPT-LoRA. Compared to previous methods, SNELLA achieves a memory reduction of 31.1%-39.9% across models with parameter scales from 86M to 632M. Our source codes are available at https://github.com/ssfgunner/SNELL.

Paper Structure

This paper contains 35 sections, 16 equations, 16 figures, 18 tables, 1 algorithm.

Figures (16)

  • Figure 1: (a) The two-stage paradigm first locates task-relevant weights based on gradients and then directly updates the located weights. (b) SNELLA updates the pre-trained weights via a sparse matrix merged by low-rank matrices. (c) Our method enables locating and updating task-relevant weights in an end-to-end manner with low memory usage.
  • Figure 2: Performance comparisons between LoRA hu2021lora, SSF lian2022scaling, SNELL-8 SNELL and SNELLA-8 across different pre-trained models and benchmarks. SNELLA demonstrates superior performance over others.
  • Figure 3: Overview of our SNELLA strategy. Given two learnable low-rank matrices, we merge them using a non-linear kernel function (left). This merging process is equivalent to mapping the matrices into higher-rank matrices and then performing matrix multiplication. Then we sparsify this merged matrix using an adaptive sparsity allocation mechanism (right). First, the layers compete with each other to determine their number of tunable weights $b$. Then, competition within layers is conducted by preserving the top-$b$ weight updates and setting the remaining updates to zero.
  • Figure 4: (a) Visualization examples of multiple kernel functions in the one-dimensional form. (b) To evaluate the expressivity of different kernels, we fit random sparse matrices with varying ranks by merging two learnable low-rank matrices with these kernels and compute the MSE loss. (c) Gradient evolution of different kernels during fine-tuning. Experiments are conducted on the Stanford-Cars dataset using pre-trained ViT-B/16.
  • Figure 5: Layer-level competition mechanism. We integrate both sensitivity and uncertainty to compute an importance score for each layer. These scores then serve as the basis for competition among layers, enabling more important layers to gain a larger number of tunable parameters.
  • ...and 11 more figures