Table of Contents
Fetching ...

SAM-I2I: Unleash the Power of Segment Anything Model for Medical Image Translation

Jiayu Huo, Sebastien Ourselin, Rachel Sparks

TL;DR

SAM-121, a novel image-to-image translation framework based on the Segment Anything Model 2 (SAM2), outperforms state-of-the-art methods, offering more efficient and accurate medical image translation.

Abstract

Medical image translation is crucial for reducing the need for redundant and expensive multi-modal imaging in clinical field. However, current approaches based on Convolutional Neural Networks (CNNs) and Transformers often fail to capture fine-grain semantic features, resulting in suboptimal image quality. To address this challenge, we propose SAM-I2I, a novel image-to-image translation framework based on the Segment Anything Model 2 (SAM2). SAM-I2I utilizes a pre-trained image encoder to extract multiscale semantic features from the source image and a decoder, based on the mask unit attention module, to synthesize target modality images. Our experiments on multi-contrast MRI datasets demonstrate that SAM-I2I outperforms state-of-the-art methods, offering more efficient and accurate medical image translation.

SAM-I2I: Unleash the Power of Segment Anything Model for Medical Image Translation

TL;DR

SAM-121, a novel image-to-image translation framework based on the Segment Anything Model 2 (SAM2), outperforms state-of-the-art methods, offering more efficient and accurate medical image translation.

Abstract

Medical image translation is crucial for reducing the need for redundant and expensive multi-modal imaging in clinical field. However, current approaches based on Convolutional Neural Networks (CNNs) and Transformers often fail to capture fine-grain semantic features, resulting in suboptimal image quality. To address this challenge, we propose SAM-I2I, a novel image-to-image translation framework based on the Segment Anything Model 2 (SAM2). SAM-I2I utilizes a pre-trained image encoder to extract multiscale semantic features from the source image and a decoder, based on the mask unit attention module, to synthesize target modality images. Our experiments on multi-contrast MRI datasets demonstrate that SAM-I2I outperforms state-of-the-art methods, offering more efficient and accurate medical image translation.

Paper Structure

This paper contains 8 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of our SAM-I2I framework. The pre-trained image encoder in SAM2 is the backbone model used to extract hierarchical features. The image decoder based on the mask unit attention module fuses multiscale features to generate the target modality images. We freeze the weights of the image encoder and only optimize the decoder during the training stage.
  • Figure 2: Qualitative results of the image translation tasks. A $\rightarrow$ B indicates A is the source domain and B is the target domain. We provide source images, target images, and error maps for comparison.
  • Figure 3: The visualization of the first stage features from different pre-trained models. Features from the Hiera model contain more details compared to other models.