Table of Contents
Fetching ...

K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

Bangwei Guo, Yunhe Gao, Meng Ye, Difei Gu, Yang Zhou, Leon Axel, Dimitris Metaxas

TL;DR

K-Prism addresses the fragmentation of medical image segmentation by unifying semantic priors, in-context exemplars, and interactive feedback within a single architecture. It encodes these heterogeneous knowledge sources into a dual-pPrompt representation consisting of $1$-D sparse prompts and $2$-D dense prompts, which are dynamically routed through a Mixture-of-Experts decoder to support semantic, in-context, and interactive segmentation. Across 18 public datasets spanning CT, MRI, X-ray, pathology, and ultrasound, K-Prism achieves state-of-the-art performance and strong generalization to external and unseen domains, while enabling efficient interactive refinement with minimal user input. This work positions K-Prism as a practical foundation for universal medical segmentation, capable of bridging clinical workflows and AI-assisted decision making, with code to be released upon publication.

Abstract

Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $\textbf{K-Prism}$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $\textit{semantic priors}$ learned from annotated datasets, (ii) $\textit{in-context knowledge}$ from few-shot reference examples, and (iii) $\textit{interactive feedback}$ from user inputs like clicks or scribbles. Our key insight is that these heterogeneous knowledge sources can be encoded into a dual-prompt representation: 1-D sparse prompts defining $\textit{what}$ to segment and 2-D dense prompts indicating $\textit{where}$ to attend, which are then dynamically routed through a Mixture-of-Experts (MoE) decoder. This design enables flexible switching between paradigms and joint training across diverse tasks without architectural modifications. Comprehensive experiments on 18 public datasets spanning diverse modalities (CT, MRI, X-ray, pathology, ultrasound, etc.) demonstrate that K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation settings. Code will be released upon publication.

K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

TL;DR

K-Prism addresses the fragmentation of medical image segmentation by unifying semantic priors, in-context exemplars, and interactive feedback within a single architecture. It encodes these heterogeneous knowledge sources into a dual-pPrompt representation consisting of -D sparse prompts and -D dense prompts, which are dynamically routed through a Mixture-of-Experts decoder to support semantic, in-context, and interactive segmentation. Across 18 public datasets spanning CT, MRI, X-ray, pathology, and ultrasound, K-Prism achieves state-of-the-art performance and strong generalization to external and unseen domains, while enabling efficient interactive refinement with minimal user input. This work positions K-Prism as a practical foundation for universal medical segmentation, capable of bridging clinical workflows and AI-assisted decision making, with code to be released upon publication.

Abstract

Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present , a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) learned from annotated datasets, (ii) from few-shot reference examples, and (iii) from user inputs like clicks or scribbles. Our key insight is that these heterogeneous knowledge sources can be encoded into a dual-prompt representation: 1-D sparse prompts defining to segment and 2-D dense prompts indicating to attend, which are then dynamically routed through a Mixture-of-Experts (MoE) decoder. This design enables flexible switching between paradigms and joint training across diverse tasks without architectural modifications. Comprehensive experiments on 18 public datasets spanning diverse modalities (CT, MRI, X-ray, pathology, ultrasound, etc.) demonstrate that K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation settings. Code will be released upon publication.

Paper Structure

This paper contains 29 sections, 23 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: K-Prism integrates three forms of external knowledge, semantic priors (from annotated training datasets), in-context exemplars (from reference image–mask pairs), and interactive feedback (from user clicks and previous masks) into a single framework, enabling robust segmentation across diverse modalities and targets.
  • Figure 2: (a) Overview of the proposed K-Prism framework. Our model integrates three forms of external knowledge via the prompt fusion modules, encoding them into 1-D sparse queries and 2-D dense prompts to produce fusion feature maps. (b) The MoE decoder dynamically routes different prompts to specialized experts through cross-attention and gating, enabling task-aware specialization and robust segmentation across diverse scenarios.
  • Figure 3: Convergence curves of interactive segmentation on in-distribution, external, and unseen-class datasets. K-Prism consistently achieves higher Dice scores and faster convergence compared to all baselines.
  • Figure 4: Distribution of softmax expert weights across different modes on the external ACDC dataset.
  • Figure 5: Convergence curves of K-Prism's interactive segmentation (Mode-3) on all datasets.
  • ...and 3 more figures