Table of Contents
Fetching ...

KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation

Shangde Gao, Yichao Fu, Ke Liu, Hongxia Xu, Jian Wu

TL;DR

This work proposes an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models, each specialized for a distinct task.

Abstract

Recently, many foundation models for medical image analysis such as MedSAM, SwinUNETR have been released and proven to be useful in multiple tasks. However, considering the inherent heterogeneity and inhomogeneity of real-world medical data, directly applying these models to specific medical image segmentation tasks often leads to negative domain shift effects, which can severely weaken the model's segmentation capabilities. To this end, we propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models, each specialized for a distinct task. Specifically, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model. Then, the input data for all challenging tasks are encoded in the foundation model and the expert models, respectively, and their backbone features are jointly projected into the adaptive amalgamation layer. Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts, which significantly reduces the domain shift arising from the inter-task differences. Finally, the gold amalgamated features and the prompt features are fed into the mask decoder to obtain the segmentation results. Extensive experiments conducted in these challenging tasks demonstrate the effectiveness and adaptability of our foundation model for real-world medical image segmentation.

KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation

TL;DR

This work proposes an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models, each specialized for a distinct task.

Abstract

Recently, many foundation models for medical image analysis such as MedSAM, SwinUNETR have been released and proven to be useful in multiple tasks. However, considering the inherent heterogeneity and inhomogeneity of real-world medical data, directly applying these models to specific medical image segmentation tasks often leads to negative domain shift effects, which can severely weaken the model's segmentation capabilities. To this end, we propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models, each specialized for a distinct task. Specifically, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model. Then, the input data for all challenging tasks are encoded in the foundation model and the expert models, respectively, and their backbone features are jointly projected into the adaptive amalgamation layer. Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts, which significantly reduces the domain shift arising from the inter-task differences. Finally, the gold amalgamated features and the prompt features are fed into the mask decoder to obtain the segmentation results. Extensive experiments conducted in these challenging tasks demonstrate the effectiveness and adaptability of our foundation model for real-world medical image segmentation.

Paper Structure

This paper contains 17 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: (a) Input sub-task datasets (e.g., MyoPS$++$ split by modality, sequence) into the nnUNet framework, train them to obtain task-specific expert models via cross-entropy loss. (b) The knowledge adaptive amalgamation of experts framework includes a training phase (red dashed rectangular) and inference phase (blue dashed rectangular). In the training phase, the target student network model adaptively merges task-specific and task-agnostic expert knowledge based on the hierarchical attention module. In the inference phase, only a single target model is used for inference of input medical images.
  • Figure 2: Visualization of comprehensive multi-task segmentation results. The results are reported in terms of DSC for best and worst cases.