Testing the Segment Anything Model on radiology data
José Guilherme de Almeida, Nuno M. Rodrigues, Sara Silva, Nickolas Papanikolaou
TL;DR
This study critically evaluates SAM, a pioneering foundation segmentation model, on MRI data to probe its zero-shot capabilities in radiology. By applying standard and seeded inference modes and four segment-selection heuristics across MRI datasets from the Medical Segmentation Decathlon and ProstateX, the authors quantify performance with Dice and related metrics. They find SAM generally underperforms for radiology segmentation, with a best Dice of ${65\%}$ at IoU>${0.25}$ for the left atrium, and substantially poorer results for other tasks; seeds do not improve results. The work highlights the domain gap between natural-image-trained foundation models and radiology data, suggesting careful use of SAM in clinical settings and pointing to improvements from domain-specific adaptations and finetuning.
Abstract
Deep learning models trained with large amounts of data have become a recent and effective approach to predictive problem solving -- these have become known as "foundation models" as they can be used as fundamental tools for other applications. While the paramount examples of image classification (earlier) and large language models (more recently) led the way, the Segment Anything Model (SAM) was recently proposed and stands as the first foundation model for image segmentation, trained on over 10 million images and with recourse to over 1 billion masks. However, the question remains -- what are the limits of this foundation? Given that magnetic resonance imaging (MRI) stands as an important method of diagnosis, we sought to understand whether SAM could be used for a few tasks of zero-shot segmentation using MRI data. Particularly, we wanted to know if selecting masks from the pool of SAM predictions could lead to good segmentations. Here, we provide a critical assessment of the performance of SAM on magnetic resonance imaging data. We show that, while acceptable in a very limited set of cases, the overall trend implies that these models are insufficient for MRI segmentation across the whole volume, but can provide good segmentations in a few, specific slices. More importantly, we note that while foundation models trained on natural images are set to become key aspects of predictive modelling, they may prove ineffective when used on other imaging modalities.
