Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan
TL;DR
This paper tackles the gap where Segment Anything Model (SAM) underperforms in domain-specific segmentation by introducing Conv-LoRA, a parameter-efficient fine-tuning method that fuses LoRA with lightweight convolutions and a Mixture-of-Experts to inject multi-scale local priors into SAM's ViT encoder. By offering end-to-end multi-class segmentation and freezing the prompt encoder, Conv-LoRA enables SAM to capture high-level semantics beyond its binary mask pretraining. Extensive experiments across medical, natural, agricultural, and remote-sensing domains show Conv-LoRA consistently outperforms existing PEFT methods with minimal parameter overhead and favorable training efficiency. The results reveal that Conv-LoRA not only preserves SAM’s segmentation knowledge but also enhances its ability to learn nuanced semantic distinctions, suggesting broad applicability for real-world domain adaptation. The work also provides insights into SAM’s local priors, the role of multi-scale priors, and how adaptive scale selection via MoE benefits downstream segmentation tasks.
Abstract
The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.
