Table of Contents
Fetching ...

SAM2LoRA: Composite Loss-Guided, Parameter-Efficient Finetuning of SAM2 for Retinal Fundus Segmentation

Sayan Mandal, Divyadarshini Karthikeyan, Manas Paldhe

TL;DR

Problem: Efficiently adapting a large foundation model for domain-specific retinal fundus segmentation under cross-dataset settings. Approach: Apply Low-Rank Adaptation (LoRA) to both the image encoder and mask decoder of SAM2, guided by a composite loss consisting of $SegmentationBCE$, $SoftDice$, and $FocalTversky$ (with $\alpha = 2 \times \text{rank}$ for LoRA scaling). Contributions: Demonstrate state-of-the-art Dice and AUC across 11 fundus datasets while updating under 5% of parameters; provide extensive ablations (LoRA ranks, module placement, loss functions) and prompt-mode analysis to show robustness across real-world prompting scenarios. Significance: Enables efficient, robust, cross-dataset fundus segmentation suitable for clinical deployment, reducing training overhead without sacrificing performance.

Abstract

We propose SAM2LoRA, a parameter-efficient fine-tuning strategy that adapts the Segment Anything Model 2 (SAM2) for fundus image segmentation. SAM2 employs a masked autoencoder-pretrained Hierarchical Vision Transformer for multi-scale feature decoding, enabling rapid inference in low-resource settings; however, fine-tuning remains challenging. To address this, SAM2LoRA integrates a low-rank adapter into both the image encoder and mask decoder, requiring fewer than 5\% of the original trainable parameters. Our analysis indicates that for cross-dataset fundus segmentation tasks, a composite loss function combining segmentationBCE, SoftDice, and FocalTversky losses is essential for optimal network tuning. Evaluated on 11 challenging fundus segmentation datasets, SAM2LoRA demonstrates high performance in both blood vessel and optic disc segmentation under cross-dataset training conditions. It achieves Dice scores of up to 0.86 and 0.93 for blood vessel and optic disc segmentation, respectively, and AUC values of up to 0.98 and 0.99, achieving state-of-the-art performance while substantially reducing training overhead.

SAM2LoRA: Composite Loss-Guided, Parameter-Efficient Finetuning of SAM2 for Retinal Fundus Segmentation

TL;DR

Problem: Efficiently adapting a large foundation model for domain-specific retinal fundus segmentation under cross-dataset settings. Approach: Apply Low-Rank Adaptation (LoRA) to both the image encoder and mask decoder of SAM2, guided by a composite loss consisting of , , and (with for LoRA scaling). Contributions: Demonstrate state-of-the-art Dice and AUC across 11 fundus datasets while updating under 5% of parameters; provide extensive ablations (LoRA ranks, module placement, loss functions) and prompt-mode analysis to show robustness across real-world prompting scenarios. Significance: Enables efficient, robust, cross-dataset fundus segmentation suitable for clinical deployment, reducing training overhead without sacrificing performance.

Abstract

We propose SAM2LoRA, a parameter-efficient fine-tuning strategy that adapts the Segment Anything Model 2 (SAM2) for fundus image segmentation. SAM2 employs a masked autoencoder-pretrained Hierarchical Vision Transformer for multi-scale feature decoding, enabling rapid inference in low-resource settings; however, fine-tuning remains challenging. To address this, SAM2LoRA integrates a low-rank adapter into both the image encoder and mask decoder, requiring fewer than 5\% of the original trainable parameters. Our analysis indicates that for cross-dataset fundus segmentation tasks, a composite loss function combining segmentationBCE, SoftDice, and FocalTversky losses is essential for optimal network tuning. Evaluated on 11 challenging fundus segmentation datasets, SAM2LoRA demonstrates high performance in both blood vessel and optic disc segmentation under cross-dataset training conditions. It achieves Dice scores of up to 0.86 and 0.93 for blood vessel and optic disc segmentation, respectively, and AUC values of up to 0.98 and 0.99, achieving state-of-the-art performance while substantially reducing training overhead.

Paper Structure

This paper contains 14 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: SAM2 LoRA Framework Architecture: LoRA is integrated into every projection layer within the transformer blocks' attention modules.
  • Figure 2: Side-by-side comparison of segmentation masks: ground truth versus predictions from SAM2LoRA at LoRA rank 32.
  • Figure 3: Dice scores for different LoRA ranks compared across datasets aggregated over all prompt modes.
  • Figure 4: Vessel Segmentation Dice performance under varying prompt scenarios, reported at each ablation mode’s best-performing LoRA rank, with ablation modes defined as in Section \ref{['sec:ablation']}
  • Figure 5: Optic Disc Segmentation Dice performance under varying prompt scenarios, reported at each ablation mode’s best-performing LoRA rank, with ablation modes defined as in Section \ref{['sec:ablation']}