End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks
Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan
TL;DR
The paper tackles data scarcity in medical image segmentation by proposing an end-to-end Hamiltonian Variational Autoencoder (HVAE) that jointly generates medical images and tumor masks. It introduces a Hamiltonian Monte Carlo–based posterior sampling framework and a joint ELBO that models the paired data, enabling realistic, paired image–mask synthesis. Experimental results on BRATS and HECKTOR show HVAE-based augmentation outperforms vanilla VAE and LSGAN in data-scarce regimes, with higher DSC and improved image quality (PSNR/SSIM), particularly for ~300 synthetic samples. This work demonstrates that end-to-end joint generation can meaningfully boost segmentation performance in limited-data settings and points to future hybridizations with adversarial learning and latent-space geometry exploration.
Abstract
Despite the increasing use of deep learning in medical image segmentation, acquiring sufficient training data remains a challenge in the medical field. In response, data augmentation techniques have been proposed; however, the generation of diverse and realistic medical images and their corresponding masks remains a difficult task, especially when working with insufficient training sets. To address these limitations, we present an end-to-end architecture based on the Hamiltonian Variational Autoencoder (HVAE). This approach yields an improved posterior distribution approximation compared to traditional Variational Autoencoders (VAE), resulting in higher image generation quality. Our method outperforms generative adversarial architectures under data-scarce conditions, showcasing enhancements in image quality and precise tumor mask synthesis. We conduct experiments on two publicly available datasets, MICCAI's Brain Tumor Segmentation Challenge (BRATS), and Head and Neck Tumor Segmentation Challenge (HECKTOR), demonstrating the effectiveness of our method on different medical imaging modalities.
