VerSe: Integrating Multiple Queries as Prompts for Versatile Cardiac MRI Segmentation
Bangwei Guo, Meng Ye, Yunhe Gao, Bingyu Xin, Leon Axel, Dimitris Metaxas
TL;DR
VerSe addresses the gap between automatic cardiac MRI segmentation and clinical precision by unifying automatic and interactive approaches through multi-query prompts. It introduces learnable object queries $X_o$ and a combined click prompt $X_c$ consisting of sparse positional queries $X_s$ and semantic feature queries $X_f$, all processed by a shared backbone with foreground-background masked attention and multi-scale residuals. Trained on nine datasets across cardiac MRI and out-of-distribution domains, VerSe achieves competitive automatic performance and state-of-the-art interactive performance, with strong generalization to BraTS and OAIZIB. The framework, implemented with UTNet-based encoding and a transformer decoder, demonstrates versatile, efficient segmentation suitable for large-scale clinical deployment, with code available at https://github.com/bangwayne/Verse. It advances human-in-the-loop medical image segmentation by effectively fusing machine priors with expert prompts.
Abstract
Despite the advances in learning-based image segmentation approach, the accurate segmentation of cardiac structures from magnetic resonance imaging (MRI) remains a critical challenge. While existing automatic segmentation methods have shown promise, they still require extensive manual corrections of the segmentation results by human experts, particularly in complex regions such as the basal and apical parts of the heart. Recent efforts have been made on developing interactive image segmentation methods that enable human-in-the-loop learning. However, they are semi-automatic and inefficient, due to their reliance on click-based prompts, especially for 3D cardiac MRI volumes. To address these limitations, we propose VerSe, a Versatile Segmentation framework to unify automatic and interactive segmentation through mutiple queries. Our key innovation lies in the joint learning of object and click queries as prompts for a shared segmentation backbone. VerSe supports both fully automatic segmentation, through object queries, and interactive mask refinement, by providing click queries when needed. With the proposed integrated prompting scheme, VerSe demonstrates significant improvement in performance and efficiency over existing methods, on both cardiac MRI and out-of-distribution medical imaging datasets. The code is available at https://github.com/bangwayne/Verse.
