Fine-tuning Segment Anything for Real-Time Tumor Tracking in Cine-MRI
Valentin Boussot, Cédric Hémon, Jean-Claude Nunes, Jean-Louis Dillenseger
TL;DR
This work tackles real-time tumor tracking in cine-MRI under severe data scarcity for the TrackRAD2025 challenge by evaluating two complementary approaches: unsupervised IMPACT-based registration and foundation-model-based segmentation using SAM. Because real-time constraints preclude heavy optimization, the authors select a SAM2.1 b+ segmentation pipeline, fine-tuned on a small annotated subset with 1024×1024 patches and 8-frame prompts, achieving high validation accuracy and robust generalization. On the hidden test set, the method reaches a DSC of 0.8794 and ranks 6th, illustrating the potential of adapting foundation models to MRI-guided radiotherapy under limited labels. The findings demonstrate that prompt-tuned, real-time foundation-model segmentation can deliver accurate tumor tracking across anatomical sites and MRI field strengths, offering a practical approach for online dose adaptation in radiotherapy.
Abstract
In this work, we address the TrackRAD2025 challenge of real-time tumor tracking in cine-MRI sequences of the thoracic and abdominal regions under strong data scarcity constraints. Two complementary strategies were explored: (i) unsupervised registration with the IMPACT similarity metric and (ii) foundation model-based segmentation leveraging SAM 2.1 and its recent variants through prompt-based interaction. Due to the one-second runtime constraint, the SAM-based method was ultimately selected. The final configuration used SAM2.1 b+ with mask-based prompts from the first annotated slice, fine-tuned solely on the small labeled subset from TrackRAD2025. Training was configured to minimize overfitting, using 1024x1024 patches (batch size 1), standard augmentations, and a balanced Dice + IoU loss. A low uniform learning rate (0.0001) was applied to all modules (prompt encoder, decoder, Hiera backbone) to preserve generalization while adapting to annotator-specific styles. Training lasted 300 epochs (~12h on RTX A6000, 48GB). The same inference strategy was consistently applied across all anatomical sites and MRI field strengths. Test-time augmentation was considered but ultimately discarded due to negligible performance gains. The final model was selected based on the highest Dice Similarity Coefficient achieved on the validation set after fine-tuning. On the hidden test set, the model reached a Dice score of 0.8794, ranking 6th overall in the TrackRAD2025 challenge. These results highlight the strong potential of foundation models for accurate and real-time tumor tracking in MRI-guided radiotherapy.
