I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP
Yilmaz Korkmaz, Vishal M. Patel
TL;DR
This work presents I2I-Galip, an unsupervised multi-domain medical image translation framework that leverages BiomedCLIP, a medical vision-language model, to guide translation with a single lightweight generator (~13M parameters) and domain-specific discriminators. By freezing BiomedCLIP and using CLIP-based losses (L_clip, L_cls) along with cycle-consistency (L_cycle) and identity constraints, the approach achieves competitive or superior PSNR and SSIM across multi-domain MRI (IXI) and CT-MRI datasets while reducing model complexity compared to traditional many-pair GAN baselines. Ablation indicates most gains arise from adversarial and cycle losses, with CLIP-guided terms offering additional benefits mainly in multi-domain settings; limitations include reliance on captions and potential instability from GAN training. Overall, I2I-Galip demonstrates that a foundation-model-guided, cycle-consistent GAN can deliver high-quality, multi-domain medical image translations with a lightweight architecture, broadening the practicality of unsupervised domain translation in clinical imaging.
Abstract
Unpaired image-to-image translation is a challenging task due to the absence of paired examples, which complicates learning the complex mappings between the distinct distributions of the source and target domains. One of the most commonly used approach for this task is CycleGAN which requires the training of a new pair of generator-discriminator networks for each domain pair. In this paper, we propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip) where we utilize a pre-trained multi-model foundation model (i.e., CLIP) to mitigate the need of separate generator-discriminator pairs for each source-target mapping while achieving better and more efficient multi-domain translation. By utilizing the massive knowledge gathered during pre-training a foundation model, our approach makes use of a single lightweight generator network with ~13M parameters for the multi-domain image translation task. Comprehensive experiments on translation performance in public MRI and CT datasets show the superior performance of the proposed framework over the existing approaches. Code will be available (https://github.com/yilmazkorkmaz1/I2I-Galip).
