Table of Contents
Fetching ...

Multi-state Models For Disease Histories Based On Longitudinal Data

Simon Wiegrebe, Johannes Piller, Mathias Gorski, Merle Behr, Helmut Küchenhoff, Iris M. Heid, Andreas Bender

TL;DR

In simulation studies, it is shown that PAMs can handle dependent left-truncation and accommodate multiple time scales, and the extent of index event bias in multiple settings is quantified, demonstrating its dependence on the completeness of covariate adjustment.

Abstract

Multi-stage disease histories derived from longitudinal data are becoming increasingly available as registry data and biobanks expand. Multi-state models are suitable to investigate transitions between different disease stages in presence of competing risks. In this context, however, their estimation is complicated by dependent left-truncation, multiple time scales, index event bias, and interval-censoring. In this work, we investigate the extension of piecewise exponential additive models (PAMs) to this setting and their applicability given the above challenges. In simulation studies we show that PAMs can handle dependent left-truncation and accommodate multiple time scales. Compared to a stratified single time scale model, a multiple time scales model is found to be less robust to the data generating process. We also quantify the extent of index event bias in multiple settings, demonstrating its dependence on the completeness of covariate adjustment. In general, PAMs recover baseline and fixed effects well in most settings, except for baseline hazards in interval-censored data. Finally, we apply our framework to estimate multi-state transition hazards and probabilities of chronic kidney disease (CKD) onset and progression in a UK Biobank dataset (n=142,667). We observe CKD progression risk to be highest for individuals with early CKD onset and to further increase over age. In addition, the well-known genetic variant rs77924615 in the UMOD locus is found to be associated with CKD onset hazards, but not with risk of further CKD progression.

Multi-state Models For Disease Histories Based On Longitudinal Data

TL;DR

In simulation studies, it is shown that PAMs can handle dependent left-truncation and accommodate multiple time scales, and the extent of index event bias in multiple settings is quantified, demonstrating its dependence on the completeness of covariate adjustment.

Abstract

Multi-stage disease histories derived from longitudinal data are becoming increasingly available as registry data and biobanks expand. Multi-state models are suitable to investigate transitions between different disease stages in presence of competing risks. In this context, however, their estimation is complicated by dependent left-truncation, multiple time scales, index event bias, and interval-censoring. In this work, we investigate the extension of piecewise exponential additive models (PAMs) to this setting and their applicability given the above challenges. In simulation studies we show that PAMs can handle dependent left-truncation and accommodate multiple time scales. Compared to a stratified single time scale model, a multiple time scales model is found to be less robust to the data generating process. We also quantify the extent of index event bias in multiple settings, demonstrating its dependence on the completeness of covariate adjustment. In general, PAMs recover baseline and fixed effects well in most settings, except for baseline hazards in interval-censored data. Finally, we apply our framework to estimate multi-state transition hazards and probabilities of chronic kidney disease (CKD) onset and progression in a UK Biobank dataset (n=142,667). We observe CKD progression risk to be highest for individuals with early CKD onset and to further increase over age. In addition, the well-known genetic variant rs77924615 in the UMOD locus is found to be associated with CKD onset hazards, but not with risk of further CKD progression.

Paper Structure

This paper contains 33 sections, 13 equations, 21 figures, 15 tables.

Figures (21)

  • Figure 1: State diagram for Chronic Kidney Disease. This state diagram illustrates states and transitions within a chronic kidney disease (CKD) disease history. Circles represent transient states (Healthy (initial state), Mild CKD, Severe CKD), squares represent absorbing states (end-stage kidney disease (ESKD; considered to be absorbing here), Death). The arrows between states indicate possible/allowed transitions: 0$\rightarrow$1 (CKD onset), 1$\rightarrow$2 (progression to Severe CKD), 2$\rightarrow$3 (progression to ESKD), as well as transitions into Death out of all transient states.
  • Figure 2: Examples of disease histories regarding CKD. Based on the state diagram in Figure \ref{['fig:state-diagram-ckd']}, this figure illustrates CKD onset and progression histories of three hypothetical subjects. Commencing at study entry (Healthy), the disease histories terminate in the absorbing states ESKD or Death or cease to be observable following right-censoring. Horizontal arrows represent time scales. The precise definitions of time scales and state-entry times, as well as the practical implementation via transition-specific helper variables, are described in Section \ref{['sec:pam']}.
  • Figure 3: State diagram for simulation studies. This state diagram has initial state $0$, interim state $1$, and absorbing states $2$ and $3$. Arrows depict possible transitions: $0 \rightarrow 1$ (onset), $0 \rightarrow 3$ (death without disease), $1 \rightarrow 2$ (progression), and $1 \rightarrow 3$ (death with disease).
  • Figure 4: Fixed effect estimates by transition from SSTS and MTS PAMs on data simulated from SSTS and MTS DGPs. This figure illustrates first quartile, median and third quartile (boxes) of effect size estimates of a binary covariate $x_1$ on log-hazards across $500$ simulation runs for transitions $k \in \{0\rightarrow1, 0\rightarrow3, 1\rightarrow2, 1\rightarrow3\}$. This is shown for SSTS and MTS PAMs, SSTS and MTS DGPs, and smooth types penalized splines versus factor smooth. Whiskers represent 1.5 times the interquartile range from the first and third quartile. Orange lines denote true effect sizes $\beta_{x_1,k}$.
  • Figure 5: Induced negative correlation between risk factors and bias in effect size estimates in simulated multi-state data. Panel a shows Pearson correlation coefficients between two risk factors $x_1$ and $x_2$ in states $0$ and $1$, simulated as independent risk factors in the healthy population under SSTS and MTS DGPs. Panel b shows effect size estimates of risk factor $x_1$ on the log-hazards of disease onset ($\hat{\beta}_{x_1,0\rightarrow1}$; first row), disease progression ($\hat{\beta}_{x_1,1\rightarrow2}$; second row), and their respective difference (third row). The underlying data are again simulated from SSTS and MTS DGPs and analyzed with SSTS and MTS PAMs. The first ($x_1$-only) model includes only the risk factor of interest, whereas the second (full) model includes both risk factors. Orange lines denote true effect sizes. The full table of results on correlations, effect size estimates, and bias for all DGPs, models, risk factor distributions, and risk factor effect sizes is available at https://github.com/survival-org/msm4diseaseHistories.
  • ...and 16 more figures