Generating Survival Interpretable Trajectories and Data

Andrei V. Konstantinov; Stanislav R. Kirpichenko; Lev V. Utkin

Generating Survival Interpretable Trajectories and Data

Andrei V. Konstantinov, Stanislav R. Kirpichenko, Lev V. Utkin

TL;DR

The paper tackles time-to-event prediction under censoring by proposing a Wasserstein variational autoencoder that jointly learns survival trajectories and data generation. It uses the Beran estimator within a robust, end-to-end framework to produce $E[T|\mathbf{x}]$ and $S(t|\mathbf{x})$, while also yielding a prototype trajectory $\xi_{\mathbf{x}}(t)$ that encodes how features should change to alter the event time. Key contributions include (i) an encoder–decoder VAE with a Bayes-weighted trajectory in latent space, (ii) a data-generation pipeline for new survival instances with censoring indicators determined by a classifier, and (iii) extensive numerical validation on synthetic and real datasets showing competitive predictive performance and interpretable trajectories. The approach advances survival data generation and explainable trajectories, with practical impact for data augmentation, counterfactual explanations, and risk prediction, and it opens avenues for integrating alternative survival models beyond Beran.

Abstract

A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.

Generating Survival Interpretable Trajectories and Data

TL;DR

and

, while also yielding a prototype trajectory

that encodes how features should change to alter the event time. Key contributions include (i) an encoder–decoder VAE with a Bayes-weighted trajectory in latent space, (ii) a data-generation pipeline for new survival instances with censoring indicators determined by a classifier, and (iii) extensive numerical validation on synthetic and real datasets showing competitive predictive performance and interpretable trajectories. The approach advances survival data generation and explainable trajectories, with practical impact for data augmentation, counterfactual explanations, and risk prediction, and it opens avenues for integrating alternative survival models beyond Beran.

Abstract

Paper Structure (19 sections, 29 equations, 21 figures, 1 table)

This paper contains 19 sections, 29 equations, 21 figures, 1 table.

Introduction
Concepts of survival analysis
Generating trajectories and data
Encoder part and training epochs
The prototype embedding trajectory
New data generation and the censored indicator
Decoder part
Training the VAE
Numerical experiments
Experiments with synthetic data
“ Linear” dataset
Two parabolas
Two circles
Experiments with real data
Veteran dataset
...and 4 more sections

Figures (21)

Figure 1: A scheme of the proposed model
Figure 2: The original set $\mathcal{A}_{r}$ of vectors $\mathbf{x}_{i}$ and the set $\mathcal{D}_{m}$ of vectors $\widetilde{\mathbf{z}}_{1},...,\widetilde{\mathbf{z}}_{m}$ normally distributed around vector $\mathbf{z}$
Figure 3: Illustration of generated points $\widehat{\mathbf{x}}$ for the “ linear” dataset
Figure 4: Generated points $(\widehat{\mathbf{x}},T_{gen})$ for the “ linear” dataset
Figure 5: Generated trajectories for the “ linear” dataset
...and 16 more figures

Generating Survival Interpretable Trajectories and Data

TL;DR

Abstract

Generating Survival Interpretable Trajectories and Data

Authors

TL;DR

Abstract

Table of Contents

Figures (21)