Table of Contents
Fetching ...

Toward Valid Generative Clinical Trial Data with Survival Endpoints

Perrine Chassat, Van Tuan Nguyen, Lucas Ducrot, Emilie Lanoy, Agathe Guilloux

TL;DR

The paper tackles the challenge of generating synthetic clinical trial data with time-to-event endpoints under censoring. It introduces HI-VAE, a variational autoencoder that jointly models mixed-type covariates and survival times within a unified latent space, without assuming independent censoring, and optimizes an ELBO-based objective to learn the joint distribution. A calibration-focused evaluation framework assesses fidelity, utility, privacy, and downstream type I error and power, revealing miscalibration in naive generations and demonstrating that a post-generation selection procedure can partially restore statistical validity. Empirical results across simulated and four real phase III datasets show HI-VAE outperforms survival GAN/VAEs on classical metrics yet highlights remaining challenges for safe, regulatory-grade use, particularly in achieving robust calibration and stronger privacy guarantees for public data sharing.

Abstract

Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While external control arms built from real-world data have been explored, a promising alternative is the generation of synthetic control arms using generative AI. A central challenge is the generation of time-to-event outcomes, which constitute primary endpoints in oncology and rare disease trials, but are difficult to model under censoring and small sample sizes. Existing generative approaches, largely GAN-based, are data-hungry, unstable, and rely on strong assumptions such as independent censoring. We introduce a variational autoencoder (VAE) that jointly generates mixed-type covariates and survival outcomes within a unified latent variable framework, without assuming independent censoring. Across synthetic and real trial datasets, we evaluate our model in two realistic scenarios: (i) data sharing under privacy constraints, where synthetic controls substitute for original data, and (ii) control-arm augmentation, where synthetic patients mitigate imbalances between treated and control groups. Our method outperforms GAN baselines on fidelity, utility, and privacy metrics, while revealing systematic miscalibration of type I error and power. We propose a post-generation selection procedure that improves calibration, highlighting both progress and open challenges for generative survival modeling.

Toward Valid Generative Clinical Trial Data with Survival Endpoints

TL;DR

The paper tackles the challenge of generating synthetic clinical trial data with time-to-event endpoints under censoring. It introduces HI-VAE, a variational autoencoder that jointly models mixed-type covariates and survival times within a unified latent space, without assuming independent censoring, and optimizes an ELBO-based objective to learn the joint distribution. A calibration-focused evaluation framework assesses fidelity, utility, privacy, and downstream type I error and power, revealing miscalibration in naive generations and demonstrating that a post-generation selection procedure can partially restore statistical validity. Empirical results across simulated and four real phase III datasets show HI-VAE outperforms survival GAN/VAEs on classical metrics yet highlights remaining challenges for safe, regulatory-grade use, particularly in achieving robust calibration and stronger privacy guarantees for public data sharing.

Abstract

Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While external control arms built from real-world data have been explored, a promising alternative is the generation of synthetic control arms using generative AI. A central challenge is the generation of time-to-event outcomes, which constitute primary endpoints in oncology and rare disease trials, but are difficult to model under censoring and small sample sizes. Existing generative approaches, largely GAN-based, are data-hungry, unstable, and rely on strong assumptions such as independent censoring. We introduce a variational autoencoder (VAE) that jointly generates mixed-type covariates and survival outcomes within a unified latent variable framework, without assuming independent censoring. Across synthetic and real trial datasets, we evaluate our model in two realistic scenarios: (i) data sharing under privacy constraints, where synthetic controls substitute for original data, and (ii) control-arm augmentation, where synthetic patients mitigate imbalances between treated and control groups. Our method outperforms GAN baselines on fidelity, utility, and privacy metrics, while revealing systematic miscalibration of type I error and power. We propose a post-generation selection procedure that improves calibration, highlighting both progress and open challenges for generative survival modeling.

Paper Structure

This paper contains 57 sections, 16 equations, 23 figures, 23 tables.

Figures (23)

  • Figure 1: Performance comparison on simulated and real datasets, using J-S distance, survival curve distance, and $K$-map score. Arrows indicate directions of better performance.
  • Figure 2: Type I error and power estimation for independent case (top) and dependent case (bottom). Dashed lines: empirical power. Green: theoretical power with reduced control size. Blue: theoretical power with generated control size.
  • Figure 3: Proportion of Monte Carlo replications with at least one generated dataset not rejected by the adjusted log-rank test (at the 5% level) against the original controls.
  • Figure 4: Type I error and power estimation after post-generation selection for independent case (top) and dependent case (bottom). Dashed lines: empirical power. Green: theoretical power with reduced control size. Blue: theoretical power with generated control size.
  • Figure 5: Type I error and power estimation after post-generation selection for independent case under two training strategies: using only available control samples (top) and using both control and treated arms (bottom). Dashed lines: empirical power. Green: theoretical power with reduced control size. Blue: theoretical power with generated control size.
  • ...and 18 more figures