Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes
Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan
TL;DR
This work tackles data scarcity in medical tumor segmentation by introducing a discriminative regularized Hamiltonian VAE (dHVAE) that jointly models image and tumor mask distributions as $p_ heta(x,m|z)$ and synthesizes image–mask pairs in a single pass. It combines a perceptual and pixel-wise feature reconstruction loss with a small-weight adversarial regularization, enabling realistic, diverse samples while preserving mode coverage in limited-data settings. The method employs a slice-by-slice 2D-to-3D augmentation strategy within a four-block encoder–decoder HVAE architecture and demonstrates significant improvements in downstream Dice scores on BRATS (MRI) and HECKTOR (PET) datasets compared to traditional augmentation and several generative baselines. This approach offers a practical, data-efficient path to improve tumor segmentation in clinical scenarios where annotated data are scarce, with potential extensions to quantum-inspired latent-density formulations.
Abstract
Deep learning has gained significant attention in medical image segmentation. However, the limited availability of annotated training data presents a challenge to achieving accurate results. In efforts to overcome this challenge, data augmentation techniques have been proposed. However, the majority of these approaches primarily focus on image generation. For segmentation tasks, providing both images and their corresponding target masks is crucial, and the generation of diverse and realistic samples remains a complex task, especially when working with limited training datasets. To this end, we propose a new end-to-end hybrid architecture based on Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images. Our method provides an accuracte estimation of the joint distribution of the images and masks, resulting in the generation of realistic medical images with reduced artifacts and off-distribution instances. As generating 3D volumes requires substantial time and memory, our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset. Experiments conducted on two public datasets, BRATS (MRI modality) and HECKTOR (PET modality), demonstrate the efficacy of our proposed method on different medical imaging modalities with limited data.
