Table of Contents
Fetching ...

Handling Uncertainty in Health Data using Generative Algorithms

Mahdi Arab Loodaricheh, Neh Majmudar, Anita Raja, Ansaf Salleb-Aouissi

TL;DR

This paper tackles uncertainty and severe class imbalance in healthcare analytics by introducing RIGA, a four-phase pipeline that converts tabular health data into 28x28 images and uses GAN-based generative augmentation (cGAN, VQVAE, VQGAN) to synthesize minority-class samples. Generated images are either classified directly via CNNs or inversely transformed back to tabular form for traditional models like XGBoost, with performance evaluated primarily by AUC. A Bayesian-network learning stage (U2 pipeline) assesses how augmentation affects feature dependencies, using structure learning with the BIC criterion and Markov Blanket visualizations. Empirical results across nuMoM2b, Madelon, Myocardial Infarction complications, and DARWIN show dataset-dependent gains, with VQGAN providing the strongest improvements on larger datasets and VQVAE performing best on smaller ones, alongside enhanced Bayesian structure discovery.

Abstract

Understanding and managing uncertainty is crucial in machine learning, especially in high-stakes domains like healthcare, where class imbalance can impact predictions. This paper introduces RIGA, a novel pipeline that mitigates class imbalance using generative AI. By converting tabular healthcare data into images, RIGA leverages models like cGAN, VQVAE, and VQGAN to generate balanced samples, improving classification performance. These representations are processed by CNNs and later transformed back into tabular format for seamless integration. This approach enhances traditional classifiers like XGBoost, improves Bayesian structure learning, and strengthens ML model robustness by generating realistic synthetic data for underrepresented classes.

Handling Uncertainty in Health Data using Generative Algorithms

TL;DR

This paper tackles uncertainty and severe class imbalance in healthcare analytics by introducing RIGA, a four-phase pipeline that converts tabular health data into 28x28 images and uses GAN-based generative augmentation (cGAN, VQVAE, VQGAN) to synthesize minority-class samples. Generated images are either classified directly via CNNs or inversely transformed back to tabular form for traditional models like XGBoost, with performance evaluated primarily by AUC. A Bayesian-network learning stage (U2 pipeline) assesses how augmentation affects feature dependencies, using structure learning with the BIC criterion and Markov Blanket visualizations. Empirical results across nuMoM2b, Madelon, Myocardial Infarction complications, and DARWIN show dataset-dependent gains, with VQGAN providing the strongest improvements on larger datasets and VQVAE performing best on smaller ones, alongside enhanced Bayesian structure discovery.

Abstract

Understanding and managing uncertainty is crucial in machine learning, especially in high-stakes domains like healthcare, where class imbalance can impact predictions. This paper introduces RIGA, a novel pipeline that mitigates class imbalance using generative AI. By converting tabular healthcare data into images, RIGA leverages models like cGAN, VQVAE, and VQGAN to generate balanced samples, improving classification performance. These representations are processed by CNNs and later transformed back into tabular format for seamless integration. This approach enhances traditional classifiers like XGBoost, improves Bayesian structure learning, and strengthens ML model robustness by generating realistic synthetic data for underrepresented classes.

Paper Structure

This paper contains 28 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: RIGA Pipeline of Augmentation and Classification Method: Left - Tabular-to-Image Conversion with cGAN, VQVAE, and VQGAN Training. (X: Data sample, y: Condition , and Z: Latent Noise Vector) Right - Dual Classification Paths: Top - Real and Generated Images Classified via CNN; Bottom - Synthetic Images Converted Back to Tabular and Classified with XGBoost.
  • Figure 2: Markov Blanket without RIGA
  • Figure 3: Markov Blanket with RIGA
  • Figure 4: Real and Fake Images of four datasets.
  • Figure 5: Real and Synthetic Images Generated by RIGA-VQGAN for nuMoM2b and MI Datasets.