Not Quite Anything: Overcoming SAMs Limitations for 3D Medical Imaging
Keith Moore
TL;DR
This work targets brain MRI segmentation where foundation models like SAM and SAM-2 underperform due to weak boundaries. It introduces a compositional approach that freezes foundation weights and treats the model’s 2D segmentation output as an additional input channel to a lightweight 3D U-Net, improving 3D segmentation without retraining the foundation. Key innovations include segmentation guess construction via prompts or DINO attention, edge smoothing with a clipped signed distance map, and a fast, semi-supervised training regime. The method achieves strong volumetric accuracy on basal ganglia segmentation and demonstrates robustness to distribution shifts in pediatric data, highlighting practical potential for longitudinal brain-volume studies with limited labeling.
Abstract
Foundation segmentation models such as SAM and SAM-2 perform well on natural images but struggle with brain MRIs where structures like the caudate and thalamus lack sharp boundaries and have low contrast. Rather than fine tune these models (for example MedSAM), we propose a compositional alternative where the foundation model output is treated as an additional input channel and passed alongside the MRI to highlight regions of interest. We generate SAM-2 prompts by using a lightweight 3D U-Net that was previously trained on MRI segmentation. The U-Net may have been trained on a different dataset, so its guesses are often imprecise but usually in the correct region. The edges of the resulting foundation model guesses are smoothed to improve alignment with the MRI. We also test prompt free segmentation using DINO attention maps in the same framework. This has-a architecture avoids modifying foundation weights and adapts to domain shift without retraining the foundation model. It reaches about 96 percent volume accuracy on basal ganglia segmentation, which is sufficient for our study of longitudinal volume change. The approach is fast, label efficient, and robust to out of distribution scans. We apply it to study inflammation linked changes in sudden onset pediatric OCD.
