Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality
Milad Yazdani, Yasamin Medghalchi, Pooria Ashrafian, Ilker Hacihaliloglu, Dena Shahriari
TL;DR
This paper tackles the challenge of data scarcity and slow sampling in medical image synthesis by proposing Medical Optimal Transport Flow Matching (MOTFM), which learns a velocity field to transport samples from a noise source to a target image along a near-straight path. By formulating the problem as optimal transport flow matching, MOTFM achieves substantially faster inference than diffusion models while delivering equal or superior image quality, and it supports unconditional, class-conditioned, and mask-conditioned generation across 2D and 3D modalities with end-to-end training. Empirical results on CAMUS echocardiography and 3D MSD MRI Brain data show MOTFM consistently outperforms diffusion baselines in quality metrics and inference speed, with strong gains in downstream classification and segmentation tasks as well as a demonstrated denoising capability. Overall, MOTFM offers a practical, flexible, and scalable alternative to diffusion for medical image generation, with broad potential applications in data augmentation, image-to-image translation, and enhancement.
Abstract
Deep learning models have emerged as a powerful tool for various medical applications. However, their success depends on large, high-quality datasets that are challenging to obtain due to privacy concerns and costly annotation. Generative models, such as diffusion models, offer a potential solution by synthesizing medical images, but their practical adoption is hindered by long inference times. In this paper, we propose the use of an optimal transport flow matching approach to accelerate image generation. By introducing a straighter mapping between the source and target distribution, our method significantly reduces inference time while preserving and further enhancing the quality of the outputs. Furthermore, this approach is highly adaptable, supporting various medical imaging modalities, conditioning mechanisms (such as class labels and masks), and different spatial dimensions, including 2D and 3D. Beyond image generation, it can also be applied to related tasks such as image enhancement. Our results demonstrate the efficiency and versatility of this framework, making it a promising advancement for medical imaging applications. Code with checkpoints and a synthetic dataset (beneficial for classification and segmentation) is now available on: https://github.com/milad1378yz/MOTFM.
