Table of Contents
Fetching ...

VerSe: Integrating Multiple Queries as Prompts for Versatile Cardiac MRI Segmentation

Bangwei Guo, Meng Ye, Yunhe Gao, Bingyu Xin, Leon Axel, Dimitris Metaxas

TL;DR

VerSe addresses the gap between automatic cardiac MRI segmentation and clinical precision by unifying automatic and interactive approaches through multi-query prompts. It introduces learnable object queries $X_o$ and a combined click prompt $X_c$ consisting of sparse positional queries $X_s$ and semantic feature queries $X_f$, all processed by a shared backbone with foreground-background masked attention and multi-scale residuals. Trained on nine datasets across cardiac MRI and out-of-distribution domains, VerSe achieves competitive automatic performance and state-of-the-art interactive performance, with strong generalization to BraTS and OAIZIB. The framework, implemented with UTNet-based encoding and a transformer decoder, demonstrates versatile, efficient segmentation suitable for large-scale clinical deployment, with code available at https://github.com/bangwayne/Verse. It advances human-in-the-loop medical image segmentation by effectively fusing machine priors with expert prompts.

Abstract

Despite the advances in learning-based image segmentation approach, the accurate segmentation of cardiac structures from magnetic resonance imaging (MRI) remains a critical challenge. While existing automatic segmentation methods have shown promise, they still require extensive manual corrections of the segmentation results by human experts, particularly in complex regions such as the basal and apical parts of the heart. Recent efforts have been made on developing interactive image segmentation methods that enable human-in-the-loop learning. However, they are semi-automatic and inefficient, due to their reliance on click-based prompts, especially for 3D cardiac MRI volumes. To address these limitations, we propose VerSe, a Versatile Segmentation framework to unify automatic and interactive segmentation through mutiple queries. Our key innovation lies in the joint learning of object and click queries as prompts for a shared segmentation backbone. VerSe supports both fully automatic segmentation, through object queries, and interactive mask refinement, by providing click queries when needed. With the proposed integrated prompting scheme, VerSe demonstrates significant improvement in performance and efficiency over existing methods, on both cardiac MRI and out-of-distribution medical imaging datasets. The code is available at https://github.com/bangwayne/Verse.

VerSe: Integrating Multiple Queries as Prompts for Versatile Cardiac MRI Segmentation

TL;DR

VerSe addresses the gap between automatic cardiac MRI segmentation and clinical precision by unifying automatic and interactive approaches through multi-query prompts. It introduces learnable object queries and a combined click prompt consisting of sparse positional queries and semantic feature queries , all processed by a shared backbone with foreground-background masked attention and multi-scale residuals. Trained on nine datasets across cardiac MRI and out-of-distribution domains, VerSe achieves competitive automatic performance and state-of-the-art interactive performance, with strong generalization to BraTS and OAIZIB. The framework, implemented with UTNet-based encoding and a transformer decoder, demonstrates versatile, efficient segmentation suitable for large-scale clinical deployment, with code available at https://github.com/bangwayne/Verse. It advances human-in-the-loop medical image segmentation by effectively fusing machine priors with expert prompts.

Abstract

Despite the advances in learning-based image segmentation approach, the accurate segmentation of cardiac structures from magnetic resonance imaging (MRI) remains a critical challenge. While existing automatic segmentation methods have shown promise, they still require extensive manual corrections of the segmentation results by human experts, particularly in complex regions such as the basal and apical parts of the heart. Recent efforts have been made on developing interactive image segmentation methods that enable human-in-the-loop learning. However, they are semi-automatic and inefficient, due to their reliance on click-based prompts, especially for 3D cardiac MRI volumes. To address these limitations, we propose VerSe, a Versatile Segmentation framework to unify automatic and interactive segmentation through mutiple queries. Our key innovation lies in the joint learning of object and click queries as prompts for a shared segmentation backbone. VerSe supports both fully automatic segmentation, through object queries, and interactive mask refinement, by providing click queries when needed. With the proposed integrated prompting scheme, VerSe demonstrates significant improvement in performance and efficiency over existing methods, on both cardiac MRI and out-of-distribution medical imaging datasets. The code is available at https://github.com/bangwayne/Verse.

Paper Structure

This paper contains 16 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustration of the proposed versatile segmentation framework. Our model accepts an object query, such as left ventricle, to automatically segment the target in the image. If the initial segmentation is unsatisfactory, users can refine the mask by providing corrective clicks until the final output mask can meet clinical accuracy.
  • Figure 2: Overview of the VerSe architecture. In stage I, object queries are used to automatically segment a target in the image. In stage II, user provides clicks as prompts to refine the initial segmentation mask. VerSe also supports a pure interactive mode, where the initial mask is empty and the object queries aren't activated. The image encoder, transformer decoder and mask decoder are shared across all stages. Implementation details are described in Sec. \ref{['sec:implementation']}.
  • Figure 3: (a) Process of generating semantic feature query $\boldsymbol{X}_{f0}$ for a specified click point $P_0$ at scale $s$. The original click $P_0$ is mapped to the coordinates $P_{0}^{'}$ on the down-scaling feature map. A feature patch centered at $P_{0}^{'}$ undergoes average pooling, and the resulting feature is transformed via an MLP to produce $\boldsymbol{X}_{f0}$. (b) Transformer decoder at the $l$-th layer. Object queries $\boldsymbol{X}_{ol}$, positive click queries $\boldsymbol{X}_{pl}$ and negative click queries $\boldsymbol{X}_{nl}$ first interact with image features $\boldsymbol{F}_{l}$, to capture foreground target and background context. These queries are then concatenated to update the image features. $\boldsymbol{X}_{fl}$, $\boldsymbol{X}_{sl}$, and $\boldsymbol{X}_{cl}$ denote semantic feature queries, sparse positional queries, and click queries at the $l$-th layer, respectively.
  • Figure 4: Convergence analysis for models tested on four types of segmentation targets. The Combined bSSFP Dataset, including ACDC, M&Ms, M&Ms-2, and MyoPS++ (bSSFP), focuses on LV, Myo, and RV structures. VerSe demonstrates consistent accuracy improvements across all tasks as the number of clicks increases.
  • Figure 5: Segmentation results of VerSe on different medical image segmentation tasks. First row: Automatic segmentation of three structures on cardiac cine MRI. Second row: Interactive refinement of myocardial edema segmentation on cardiac T2-weighted MRI. Third row: Interactive segmentation of a tumor on out-of-distribution brain MRI.