Table of Contents
Fetching ...

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen

TL;DR

This work tackles the challenge of efficiently fine-tuning large segmentation models by addressing the limited adjustment capacity of traditional PEFT under the Hidden Markov Chain. It introduces SAM-COBOT, a cross-block orchestration framework that couples inter-block communication via a learnable relation matrix and dual coefficient sets with an intra-block hyper-complex layer to generate richer projection-direction adjustments, all while keeping training overhead around 1K parameters. The method is plug-and-play with existing PEFT approaches like LoRA and Adapterformer and shows consistent gains across natural, remote sensing, and medical segmentation benchmarks, including improvements on VIrT-based backbones. The approach offers a practical pathway to adapt foundation segmentation models to diverse downstream tasks with limited data, enhancing performance without large-scale fine-tuning.

Abstract

Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data. In the computer vision community, PEFT has shown effectiveness in image classification, but little research has studied its ability for image segmentation. Fine-tuning segmentation models usually require a heavier adjustment of parameters to align the proper projection directions in the parameter space for new scenarios. This raises a challenge to existing PEFT algorithms, as they often inject a limited number of individual parameters into each block, which prevents substantial adjustment of the projection direction of the parameter space due to the limitation of Hidden Markov Chain along blocks. In this paper, we equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We introduce a novel inter-block communication module, which integrates a learnable relation matrix to facilitate communication among different coefficient sets of each PEFT block's parameter space. Moreover, we propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer, further enhancing the impact of the adjustment of projection directions on the entire parameter space. Extensive experiments on diverse benchmarks demonstrate that our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

TL;DR

This work tackles the challenge of efficiently fine-tuning large segmentation models by addressing the limited adjustment capacity of traditional PEFT under the Hidden Markov Chain. It introduces SAM-COBOT, a cross-block orchestration framework that couples inter-block communication via a learnable relation matrix and dual coefficient sets with an intra-block hyper-complex layer to generate richer projection-direction adjustments, all while keeping training overhead around 1K parameters. The method is plug-and-play with existing PEFT approaches like LoRA and Adapterformer and shows consistent gains across natural, remote sensing, and medical segmentation benchmarks, including improvements on VIrT-based backbones. The approach offers a practical pathway to adapt foundation segmentation models to diverse downstream tasks with limited data, enhancing performance without large-scale fine-tuning.

Abstract

Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data. In the computer vision community, PEFT has shown effectiveness in image classification, but little research has studied its ability for image segmentation. Fine-tuning segmentation models usually require a heavier adjustment of parameters to align the proper projection directions in the parameter space for new scenarios. This raises a challenge to existing PEFT algorithms, as they often inject a limited number of individual parameters into each block, which prevents substantial adjustment of the projection direction of the parameter space due to the limitation of Hidden Markov Chain along blocks. In this paper, we equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We introduce a novel inter-block communication module, which integrates a learnable relation matrix to facilitate communication among different coefficient sets of each PEFT block's parameter space. Moreover, we propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer, further enhancing the impact of the adjustment of projection directions on the entire parameter space. Extensive experiments on diverse benchmarks demonstrate that our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
Paper Structure (17 sections, 15 equations, 6 figures, 5 tables)

This paper contains 17 sections, 15 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison between traditional PEFT paradigms and our proposed SAM-COBOT. (a) Traditional methods typically adjust the projection direction of each layer in SAM's parameter space individually, which is limited by the Hidden Markov Chain (HMC). This often leads to relatively minor adjustments. (b) In contrast, our SAM-COBOT approach enhances PEFT with cross-block orchestration, enabling more effective and large adjustments of the projection directions.
  • Figure 2: A schematic representation of SAM-COBOT. In the SAM-COBOT framework, we integrate an inter-block communication module followed by an intra-block enhancement module in each PEFT block.
  • Figure 3: The detailed structure of inter-block communication (IBC) module. We introduce two coefficient sets, $\Lambda^{\text{MC}}_{\ell}$ and $\Lambda^{\text{LM}}_{\ell}$, the former is communicated under the limitation of HMC, and the latter communicates with other coefficient sets among different blocks. (Best viewed in color).
  • Figure 4: The detailed structure of intra-block enhancement (IBE) module. We introduce a hyper-complex layer (HL) for facilitating communication among projection directions in each layer. "Proj": Projection. "HL": hyper-complex layer. $\mathbb{H}$: hyper-complex space, i.e., suprasphere. (Best viewed in color) "$\otimes$": $\mathtt{Hamilton}$$\mathtt{product}$.
  • Figure 5: Results on different dimensions of hidden space $r$ (Best view in color).
  • ...and 1 more figures