Table of Contents
Fetching ...

I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP

Yilmaz Korkmaz, Vishal M. Patel

TL;DR

This work presents I2I-Galip, an unsupervised multi-domain medical image translation framework that leverages BiomedCLIP, a medical vision-language model, to guide translation with a single lightweight generator (~13M parameters) and domain-specific discriminators. By freezing BiomedCLIP and using CLIP-based losses (L_clip, L_cls) along with cycle-consistency (L_cycle) and identity constraints, the approach achieves competitive or superior PSNR and SSIM across multi-domain MRI (IXI) and CT-MRI datasets while reducing model complexity compared to traditional many-pair GAN baselines. Ablation indicates most gains arise from adversarial and cycle losses, with CLIP-guided terms offering additional benefits mainly in multi-domain settings; limitations include reliance on captions and potential instability from GAN training. Overall, I2I-Galip demonstrates that a foundation-model-guided, cycle-consistent GAN can deliver high-quality, multi-domain medical image translations with a lightweight architecture, broadening the practicality of unsupervised domain translation in clinical imaging.

Abstract

Unpaired image-to-image translation is a challenging task due to the absence of paired examples, which complicates learning the complex mappings between the distinct distributions of the source and target domains. One of the most commonly used approach for this task is CycleGAN which requires the training of a new pair of generator-discriminator networks for each domain pair. In this paper, we propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip) where we utilize a pre-trained multi-model foundation model (i.e., CLIP) to mitigate the need of separate generator-discriminator pairs for each source-target mapping while achieving better and more efficient multi-domain translation. By utilizing the massive knowledge gathered during pre-training a foundation model, our approach makes use of a single lightweight generator network with ~13M parameters for the multi-domain image translation task. Comprehensive experiments on translation performance in public MRI and CT datasets show the superior performance of the proposed framework over the existing approaches. Code will be available (https://github.com/yilmazkorkmaz1/I2I-Galip).

I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP

TL;DR

This work presents I2I-Galip, an unsupervised multi-domain medical image translation framework that leverages BiomedCLIP, a medical vision-language model, to guide translation with a single lightweight generator (~13M parameters) and domain-specific discriminators. By freezing BiomedCLIP and using CLIP-based losses (L_clip, L_cls) along with cycle-consistency (L_cycle) and identity constraints, the approach achieves competitive or superior PSNR and SSIM across multi-domain MRI (IXI) and CT-MRI datasets while reducing model complexity compared to traditional many-pair GAN baselines. Ablation indicates most gains arise from adversarial and cycle losses, with CLIP-guided terms offering additional benefits mainly in multi-domain settings; limitations include reliance on captions and potential instability from GAN training. Overall, I2I-Galip demonstrates that a foundation-model-guided, cycle-consistent GAN can deliver high-quality, multi-domain medical image translations with a lightweight architecture, broadening the practicality of unsupervised domain translation in clinical imaging.

Abstract

Unpaired image-to-image translation is a challenging task due to the absence of paired examples, which complicates learning the complex mappings between the distinct distributions of the source and target domains. One of the most commonly used approach for this task is CycleGAN which requires the training of a new pair of generator-discriminator networks for each domain pair. In this paper, we propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip) where we utilize a pre-trained multi-model foundation model (i.e., CLIP) to mitigate the need of separate generator-discriminator pairs for each source-target mapping while achieving better and more efficient multi-domain translation. By utilizing the massive knowledge gathered during pre-training a foundation model, our approach makes use of a single lightweight generator network with ~13M parameters for the multi-domain image translation task. Comprehensive experiments on translation performance in public MRI and CT datasets show the superior performance of the proposed framework over the existing approaches. Code will be available (https://github.com/yilmazkorkmaz1/I2I-Galip).
Paper Structure (16 sections, 1 equation, 5 figures, 4 tables)

This paper contains 16 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Training scheme and overall model architecture of I2I-Galip is illustrated when input image is from domain B. Part a illustrates $L_{clip}$, ${L}_{cls}$ and ${L}_{adv}$ losses along with U-Net based generator, discriminator head and BiomedCLIP's ViT-B. Part b and c shows the definition of ${L}_{cycle}$ and ${L}_{identity}$ losses respectively. BiomedCLIP's ViT-B and text encoder parameters are frozen during training. "This MRI Image is T2-weighted" corresponds to a sample caption used in T1 to T2 translation.
  • Figure 2: Multi-domain translation illustrations from PD to T1-weighted image in IXI dataset. Accompanying this are error maps and magnified sections, positioned below and above each translation, respectively.
  • Figure 3: Single-domain translation from T1-weighted Pelvic MRI to CT images. Accompanying this are error maps and magnified sections, positioned below and above each translation, respectively.
  • Figure 4: Multi-domain translation illustrations from T1-weighted to T2-weighted image in IXI dataset. Accompanying this are error maps and magnified sections, positioned below and above each translation, respectively.
  • Figure 5: Multi-domain translation illustrations from T2-weighted to PD image in IXI dataset. Accompanying this are error maps and magnified sections, positioned below and above each translation, respectively.