NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

Renqi Chen; Haoyang Su; Shixiang Tang

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

Renqi Chen, Haoyang Su, Shixiang Tang

TL;DR

NAS-LoRA tackles the challenge of adapting SAM to domain-specific tasks by inserting a lightweight NAS block between LoRA's encoder and decoder to dynamically inject task-relevant inductive biases. It introduces a stage-wise optimization strategy and a PEFT-compatible NAS variant (NAS-PC-LoRA) to maintain efficiency while improving high-level semantic learning. Across nine segmentation benchmarks, NAS-LoRA and NAS-PC-LoRA outperform existing PEFT methods, achieving higher accuracy with around 24% lower training cost and no increase in inference cost. The work demonstrates that neural architecture search can be effectively and practically integrated into parameter-efficient fine-tuning for visual foundation models.

Abstract

The Segment Anything Model (SAM) has emerged as a powerful visual foundation model for image segmentation. However, adapting SAM to specific downstream tasks, such as medical and agricultural imaging, remains a significant challenge. To address this, Low-Rank Adaptation (LoRA) and its variants have been widely employed to enhancing SAM's adaptation performance on diverse domains. Despite advancements, a critical question arises: can we integrate inductive bias into the model? This is particularly relevant since the Transformer encoder in SAM inherently lacks spatial priors within image patches, potentially hindering the acquisition of high-level semantic information. In this paper, we propose NAS-LoRA, a new Parameter-Efficient Fine-Tuning (PEFT) method designed to bridge the semantic gap between pre-trained SAM and specialized domains. Specifically, NAS-LoRA incorporates a lightweight Neural Architecture Search (NAS) block between the encoder and decoder components of LoRA to dynamically optimize the prior knowledge integrated into weight updates. Furthermore, we propose a stage-wise optimization strategy to help the ViT encoder balance weight updates and architectural adjustments, facilitating the gradual learning of high-level semantic information. Various Experiments demonstrate our NAS-LoRA improves existing PEFT methods, while reducing training cost by 24.14% without increasing inference cost, highlighting the potential of NAS in enhancing PEFT for visual foundation models.

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

TL;DR

Abstract

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)