Landmark Detection for Medical Images using a General-purpose Segmentation Model
Ekaterina Stansfield, Jennifer A. Mitterer, Abdulrahman Altahhan
TL;DR
This work tackles automatic anatomical landmark detection in orthopaedic pelvic radiographs by marrying YOLO-based landmark localization with SAM-based pixel-level segmentation. The two-model pipeline enables scalable, resource-efficient landmarking, evaluated first on eight landmarks and then on a expanded set of 72 landmarks plus 18 outlines/patches. Results show YOLO detection robustly localizes landmarks while SAM provides precise segmentation when guided by YOLO-provided bounding boxes, achieving median landmark errors around 1.7–2.3 mm and strong IoU for patches. The approach demonstrates practical feasibility with moderate compute, and the authors propose iterative, human-in-the-loop refinement to further improve coverage and accuracy as more labeled data become available.
Abstract
Radiographic images are a cornerstone of medical diagnostics in orthopaedics, with anatomical landmark detection serving as a crucial intermediate step for information extraction. General-purpose foundational segmentation models, such as SAM (Segment Anything Model), do not support landmark segmentation out of the box and require prompts to function. However, in medical imaging, the prompts for landmarks are highly specific. Since SAM has not been trained to recognize such landmarks, it cannot generate accurate landmark segmentations for diagnostic purposes. Even MedSAM, a medically adapted variant of SAM, has been trained to identify larger anatomical structures, such as organs and their parts, and lacks the fine-grained precision required for orthopaedic pelvic landmarks. To address this limitation, we propose leveraging another general-purpose, non-foundational model: YOLO. YOLO excels in object detection and can provide bounding boxes that serve as input prompts for SAM. While YOLO is efficient at detection, it is significantly outperformed by SAM in segmenting complex structures. In combination, these two models form a reliable pipeline capable of segmenting not only a small pilot set of eight anatomical landmarks but also an expanded set of 72 landmarks and 16 regions with complex outlines, such as the femoral cortical bone and the pelvic inlet. By using YOLO-generated bounding boxes to guide SAM, we trained the hybrid model to accurately segment orthopaedic pelvic radiographs. Our results show that the proposed combination of YOLO and SAM yields excellent performance in detecting anatomical landmarks and intricate outlines in orthopaedic pelvic radiographs.
