Handling Uncertainty in Health Data using Generative Algorithms
Mahdi Arab Loodaricheh, Neh Majmudar, Anita Raja, Ansaf Salleb-Aouissi
TL;DR
This paper tackles uncertainty and severe class imbalance in healthcare analytics by introducing RIGA, a four-phase pipeline that converts tabular health data into 28x28 images and uses GAN-based generative augmentation (cGAN, VQVAE, VQGAN) to synthesize minority-class samples. Generated images are either classified directly via CNNs or inversely transformed back to tabular form for traditional models like XGBoost, with performance evaluated primarily by AUC. A Bayesian-network learning stage (U2 pipeline) assesses how augmentation affects feature dependencies, using structure learning with the BIC criterion and Markov Blanket visualizations. Empirical results across nuMoM2b, Madelon, Myocardial Infarction complications, and DARWIN show dataset-dependent gains, with VQGAN providing the strongest improvements on larger datasets and VQVAE performing best on smaller ones, alongside enhanced Bayesian structure discovery.
Abstract
Understanding and managing uncertainty is crucial in machine learning, especially in high-stakes domains like healthcare, where class imbalance can impact predictions. This paper introduces RIGA, a novel pipeline that mitigates class imbalance using generative AI. By converting tabular healthcare data into images, RIGA leverages models like cGAN, VQVAE, and VQGAN to generate balanced samples, improving classification performance. These representations are processed by CNNs and later transformed back into tabular format for seamless integration. This approach enhances traditional classifiers like XGBoost, improves Bayesian structure learning, and strengthens ML model robustness by generating realistic synthetic data for underrepresented classes.
