Table of Contents
Fetching ...

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

Jiesi Hu, Jianfeng Cao, Yanwu Yang, Chenfei Ye, Yixuan Zhang, Hanyang Peng, Ting Ma

TL;DR

Medverse introduces a universal 3D medical image in-context learning model trained on 22 diverse datasets to perform segmentation, transformation, and enhancement across unseen organs, modalities, and centers. It combines a next-scale autoregressive in-context learning framework (NA-ICL) with a Blockwise Cross-Attention Module (BAM) to achieve high-fidelity, full-resolution outputs while enabling long-range context–target interactions. Across held-out datasets, Medverse outperforms existing ICL baselines and approaches the performance of task-specific supervised models, demonstrating strong cross-domain generalization without fine-tuning. This work advances practical universal medical imaging AI by enabling non-retraining adaptation to new tasks and domains, with public code and weights to facilitate deployment and further research.

Abstract

In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

TL;DR

Medverse introduces a universal 3D medical image in-context learning model trained on 22 diverse datasets to perform segmentation, transformation, and enhancement across unseen organs, modalities, and centers. It combines a next-scale autoregressive in-context learning framework (NA-ICL) with a Blockwise Cross-Attention Module (BAM) to achieve high-fidelity, full-resolution outputs while enabling long-range context–target interactions. Across held-out datasets, Medverse outperforms existing ICL baselines and approaches the performance of task-specific supervised models, demonstrating strong cross-domain generalization without fine-tuning. This work advances practical universal medical imaging AI by enabling non-retraining adaptation to new tasks and domains, with public code and weights to facilitate deployment and further research.

Abstract

In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.

Paper Structure

This paper contains 22 sections, 9 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of different strategies for processing 3D medical images using ICL models. Due to the high resolution of volumetric data, direct end-to-end processing is often infeasible. The proposed next-scale autoregressive ICL framework progressively refines predictions from coarse to fine, producing full-resolution outputs.
  • Figure 2: Illustration of our model architecture and the inference pipeline of the next-scale autoregressive in-context learning framework.
  • Figure 3: Illustration of fusion modules and the blockwise cross-attention module.
  • Figure 4: Qualitative results of ICL models on segmentation tasks. For each 3D segmentation target, results are shown from two views. Red regions indicate segmentation errors. The 2D models take the slice corresponding to the first view as input. Medverse w/o NA-ICL denotes a variant of Medverse without autoregressive processing.
  • Figure 5: Qualitative results of 3D ICL models. Medverse w/o NA-ICL denotes the variant of Medverse without autoregressive context. The second row for each task presents zoomed-in views highlighting differences in generated details. Neuroverse3D produces outputs with limited resolution. Red arrows indicate artifacts.
  • ...and 5 more figures