SAM2LoRA: Composite Loss-Guided, Parameter-Efficient Finetuning of SAM2 for Retinal Fundus Segmentation
Sayan Mandal, Divyadarshini Karthikeyan, Manas Paldhe
TL;DR
Problem: Efficiently adapting a large foundation model for domain-specific retinal fundus segmentation under cross-dataset settings. Approach: Apply Low-Rank Adaptation (LoRA) to both the image encoder and mask decoder of SAM2, guided by a composite loss consisting of $SegmentationBCE$, $SoftDice$, and $FocalTversky$ (with $\alpha = 2 \times \text{rank}$ for LoRA scaling). Contributions: Demonstrate state-of-the-art Dice and AUC across 11 fundus datasets while updating under 5% of parameters; provide extensive ablations (LoRA ranks, module placement, loss functions) and prompt-mode analysis to show robustness across real-world prompting scenarios. Significance: Enables efficient, robust, cross-dataset fundus segmentation suitable for clinical deployment, reducing training overhead without sacrificing performance.
Abstract
We propose SAM2LoRA, a parameter-efficient fine-tuning strategy that adapts the Segment Anything Model 2 (SAM2) for fundus image segmentation. SAM2 employs a masked autoencoder-pretrained Hierarchical Vision Transformer for multi-scale feature decoding, enabling rapid inference in low-resource settings; however, fine-tuning remains challenging. To address this, SAM2LoRA integrates a low-rank adapter into both the image encoder and mask decoder, requiring fewer than 5\% of the original trainable parameters. Our analysis indicates that for cross-dataset fundus segmentation tasks, a composite loss function combining segmentationBCE, SoftDice, and FocalTversky losses is essential for optimal network tuning. Evaluated on 11 challenging fundus segmentation datasets, SAM2LoRA demonstrates high performance in both blood vessel and optic disc segmentation under cross-dataset training conditions. It achieves Dice scores of up to 0.86 and 0.93 for blood vessel and optic disc segmentation, respectively, and AUC values of up to 0.98 and 0.99, achieving state-of-the-art performance while substantially reducing training overhead.
