Table of Contents
Fetching ...

Evaluation metrics for temporal preservation in synthetic longitudinal patient data

Katariina Perkonoja, Parisa Movahedi, Antti Airola, Kari Auranen, Joni Virta

TL;DR

The paper tackles the challenge of evaluating how well synthetic longitudinal patient data preserve temporal structure. It introduces a comprehensive, kernel-smoothed, multidimensional set of resemblance metrics that quantify four aspects: marginal, covariance, individual, and measurement structures. Empirical illustrations with HALO and Health Gym GAN on MIMIC-III show that strong marginal similarity can hide distortions in covariance and individual trajectories, underscoring the need for holistic temporal fidelity assessments. Open-source implementations and practical guidance are provided to improve the reliability and utility of synthetic longitudinal data in healthcare while highlighting data quality and preprocessing as key determinants of temporal realism.

Abstract

This study introduces a set of metrics for evaluating temporal preservation in synthetic longitudinal patient data, defined as artificially generated data that mimic real patients' repeated measurements over time. The proposed metrics assess how synthetic data reproduces key temporal characteristics, categorized into marginal, covariance, individual-level and measurement structures. We show that strong marginal-level resemblance may conceal distortions in covariance and disruptions in individual-level trajectories. Temporal preservation is influenced by factors such as original data quality, measurement frequency, and preprocessing strategies, including binning, variable encoding and precision. Variables with sparse or highly irregular measurement times provide limited information for learning temporal dependencies, resulting in reduced resemblance between the synthetic and original data. No single metric adequately captures temporal preservation; instead, a multidimensional evaluation across all characteristics provides a more comprehensive assessment of synthetic data quality. Overall, the proposed metrics clarify how and why temporal structures are preserved or degraded, enabling more reliable evaluation and improvement of generative models and supporting the creation of temporally realistic synthetic longitudinal patient data.

Evaluation metrics for temporal preservation in synthetic longitudinal patient data

TL;DR

The paper tackles the challenge of evaluating how well synthetic longitudinal patient data preserve temporal structure. It introduces a comprehensive, kernel-smoothed, multidimensional set of resemblance metrics that quantify four aspects: marginal, covariance, individual, and measurement structures. Empirical illustrations with HALO and Health Gym GAN on MIMIC-III show that strong marginal similarity can hide distortions in covariance and individual trajectories, underscoring the need for holistic temporal fidelity assessments. Open-source implementations and practical guidance are provided to improve the reliability and utility of synthetic longitudinal data in healthcare while highlighting data quality and preprocessing as key determinants of temporal realism.

Abstract

This study introduces a set of metrics for evaluating temporal preservation in synthetic longitudinal patient data, defined as artificially generated data that mimic real patients' repeated measurements over time. The proposed metrics assess how synthetic data reproduces key temporal characteristics, categorized into marginal, covariance, individual-level and measurement structures. We show that strong marginal-level resemblance may conceal distortions in covariance and disruptions in individual-level trajectories. Temporal preservation is influenced by factors such as original data quality, measurement frequency, and preprocessing strategies, including binning, variable encoding and precision. Variables with sparse or highly irregular measurement times provide limited information for learning temporal dependencies, resulting in reduced resemblance between the synthetic and original data. No single metric adequately captures temporal preservation; instead, a multidimensional evaluation across all characteristics provides a more comprehensive assessment of synthetic data quality. Overall, the proposed metrics clarify how and why temporal structures are preserved or degraded, enabling more reliable evaluation and improvement of generative models and supporting the creation of temporally realistic synthetic longitudinal patient data.
Paper Structure (28 sections, 23 equations, 24 figures, 1 table)

This paper contains 28 sections, 23 equations, 24 figures, 1 table.

Figures (24)

  • Figure 1: Evaluation metrics proposed in this work for assessing univariate temporal preservation in synthetic LPD. Some metrics are only applicable to either continuous or discrete variables.
  • Figure 2: Schematic (theoretical) variograms with exponential and Gaussian correlation under stationarity. The variance decomposes into measurement error ($\tau^{2}$), a serially correlated (autocorrelated) component ($\sigma^{2}$), and, if present, the variance of a random intercept representing between-subject variability ($\nu^{2}$).
  • Figure 3: Kernel-smoothed mean (Metric \ref{['metric:mean_profile']}) and quantile profiles (Metric \ref{['metric:quant_prof']}) of systolic blood pressure for the original data (left) and HALO-generated synthetic data (right).
  • Figure 4: Kernel-smoothed variance profile (Metric \ref{['metric:variance']}, blue) and variogram (Metric \ref{['metric:vario']}, orange) of systolic blood pressure for the original data (left) and synthetic data generated by HALO (right).
  • Figure 5: Boxplot (left) and density plot (right) showing the kernel-smoothed rank-order variability (Metric \ref{['metric:rank_var']}) of systolic blood pressure for original (blue) and HALO-generated synthetic data (orange)
  • ...and 19 more figures

Theorems & Definitions (5)

  • Definition 1: Normalized kernel weights
  • Definition 2: Weighted empirical cumulative distribution
  • Remark 1: Choice of bandwdith $h$
  • Definition 3: Univariate temporal preservation
  • Remark 2: Goodness of approximation