Table of Contents
Fetching ...

LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation

Huynh Trinh Ngoc, Hoang Anh Nguyen Kim, Toan Nguyen Hai, Long Tran Quoc

TL;DR

LatentFM tackles the challenge of uncertainty in medical image segmentation by combining two variational autoencoders to embed images and masks into latent spaces and a conditional flow-matching model to learn the latent mask distribution conditioned on the image. The approach enables generation of multiple segmentation samples and corresponding confidence maps, improving robustness and interpretability. Experimental results on ISIC-2018 and CVC-ClinicDB show LatentFM achieving state-of-the-art dice and IoU metrics among generative methods while delivering efficient latent-space computation. The work advances uncertainty-aware segmentation by directly learning the data distribution in latent space rather than relying on a single deterministic output.

Abstract

Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simulation-free flow-based framework capable of learning exact data densities. Motivated by these advances, we propose LatentFM, a flow-based model operating in the latent space for medical image segmentation. To model the data distribution, we first design two variational autoencoders (VAEs) to encode both medical images and their corresponding masks into a lower-dimensional latent space. We then estimate a conditional velocity field that guides the flow based on the input image. By sampling multiple latent representations, our method synthesizes diverse segmentation outputs whose pixel-wise variance reliably captures the underlying data distribution, enabling both highly accurate and uncertainty-aware predictions. Furthermore, we generate confidence maps that quantify the model certainty, providing clinicians with richer information for deeper analysis. We conduct experiments on two datasets, ISIC-2018 and CVC-Clinic, and compare our method with several prior baselines, including both deterministic and generative approach models. Through comprehensive evaluations, both qualitative and quantitative results show that our approach achieves superior segmentation accuracy while remaining highly efficient in the latent space.

LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation

TL;DR

LatentFM tackles the challenge of uncertainty in medical image segmentation by combining two variational autoencoders to embed images and masks into latent spaces and a conditional flow-matching model to learn the latent mask distribution conditioned on the image. The approach enables generation of multiple segmentation samples and corresponding confidence maps, improving robustness and interpretability. Experimental results on ISIC-2018 and CVC-ClinicDB show LatentFM achieving state-of-the-art dice and IoU metrics among generative methods while delivering efficient latent-space computation. The work advances uncertainty-aware segmentation by directly learning the data distribution in latent space rather than relying on a single deterministic output.

Abstract

Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simulation-free flow-based framework capable of learning exact data densities. Motivated by these advances, we propose LatentFM, a flow-based model operating in the latent space for medical image segmentation. To model the data distribution, we first design two variational autoencoders (VAEs) to encode both medical images and their corresponding masks into a lower-dimensional latent space. We then estimate a conditional velocity field that guides the flow based on the input image. By sampling multiple latent representations, our method synthesizes diverse segmentation outputs whose pixel-wise variance reliably captures the underlying data distribution, enabling both highly accurate and uncertainty-aware predictions. Furthermore, we generate confidence maps that quantify the model certainty, providing clinicians with richer information for deeper analysis. We conduct experiments on two datasets, ISIC-2018 and CVC-Clinic, and compare our method with several prior baselines, including both deterministic and generative approach models. Through comprehensive evaluations, both qualitative and quantitative results show that our approach achieves superior segmentation accuracy while remaining highly efficient in the latent space.

Paper Structure

This paper contains 11 sections, 17 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Illustration of our pipeline for conditional FM-based segmentation during the reverse process.
  • Figure 2: Qualitative comparison between our proposed LatentFM and other generative baseline approaches. For each model, the first four columns show the initial segmentation outputs, followed by the corresponding confidence map and the averaged output mask.