Joint Modeling of Longitudinal EHR Data with Shared Random Effects for Informative Visiting and Observation Processes

Cheng-Han Yang; Xu Shi; Bhramar Mukherjee

Joint Modeling of Longitudinal EHR Data with Shared Random Effects for Informative Visiting and Observation Processes

Cheng-Han Yang, Xu Shi, Bhramar Mukherjee

TL;DR

A unified semiparametric joint modeling framework is proposed that simultaneously characterizes the visiting, biomarker observation, and longitudinal outcome processes, and central to this framework is a shared subject-specific Gaussian latent variable that captures unmeasured frailty and induces dependence across all components.

Abstract

Longitudinal electronic health record (EHR) data offer opportunities to study biomarker trajectories; however, association estimates-the primary inferential target-from standard models designed for regular observation times may be biased by a two-stage hierarchical missingness mechanism. The first stage is the visiting process (informative presence), where encounters occur at irregular times driven by patient health status; the second is the observation process (informative observation), where biomarkers are selectively measured during visits. To address these mechanisms, we propose a unified semiparametric joint modeling framework that simultaneously characterizes the visiting, biomarker observation, and longitudinal outcome processes. Central to this framework is a shared subject-specific Gaussian latent variable that captures unmeasured frailty and induces dependence across all components. We develop a three-stage estimation procedure and establish the consistency and asymptotic normality of our estimators. We also introduce a sequential procedure that imputes missing biomarkers prior to adjusting for irregular visiting and examine its performance. Simulation results demonstrate that our method yields unbiased estimates under this mechanism, whereas existing approaches can be substantially biased; notably, methods adjusting only for irregular visiting may exhibit even greater bias than those ignoring both mechanisms. We apply our framework to data from the All of Us Research Program to investigate associations between neighborhood-level socioeconomic status indicators and six blood-based biomarker trajectories, providing a robust tool for outpatient settings where irregular monitoring and selective measurement are prevalent.

Joint Modeling of Longitudinal EHR Data with Shared Random Effects for Informative Visiting and Observation Processes

TL;DR

Abstract

Paper Structure (72 sections, 18 theorems, 166 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 72 sections, 18 theorems, 166 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Method
Model Specification
Assumptions
Three-Stage Estimation
Overview
Stage 1: Visiting Process
Stage 2: Observation Process
Stage 3: Longitudinal Outcome Model
Asymptotic Properties of (beta, theta)
Existing Methods
Outcome-Only Methods
IP-Only Methods
Imputation+IP Methods
IP+IO Methods
...and 57 more sections

Key Result

Theorem 4.1

Under Assumptions asmp:censoring--asmp:distributions and regularity conditions (C1)--(C6), let $(\widehat{\bm\beta},\widehat{\bm\theta})$ be the solution to the estimating equations eq:EE_outcome, where the conditional expectations involving $U_i$ are evaluated using the Laplace approximation. Then,

Figures (8)

Figure 1: Illustration of the hierarchical data generation process involving Informative Presence (IP) and Informative Observation (IO). Left panel: Patient timelines where clinic visits (ticks) are generated by the visiting process driven by covariates $\bm{X}_i^{\mathcal{V}}$. At each visit, the observation process (driven by $\bm{X}_i^{\mathcal{O}}(t)$) determines whether the biomarker outcome $Y_i(t)$ is measured (solid dots, $R_i^{\mathcal{Y}}(t)=1$) or unmeasured (hollow circles, $R_i^{\mathcal{Y}}(t)=0$), while the underlying longitudinal biomarker trajectory is driven by $\bm{X}_i^{\mathcal{Y}}(t)$. Right panel: The resulting long-format dataset used for analysis, where "NA" in the $Y_i(t)$ column indicates an unmeasured outcome ($R_i^{\mathcal{Y}}(t)=0$) despite the patient's presence at the clinic.
Figure 2: Three-stage estimation procedure. Gray shaded boxes indicate latent information transmitted across stages. Stages 1 and 2 estimate nuisance parameters to estimate the empirical Bayes posterior and marginalized observation probabilities, respectively; these quantities are subsequently incorporated into Stage 3 to correct for clinically informed bias.
Figure 3: Evaluation of $\beta_F$ estimator performance across Setting A (Scenarios A.1--A.4). Top: Empirical bias of $\widehat{\beta}_F$ (dashed line at 0). Bottom: RMSE of $\widehat{\beta}_F$. Boxplots summarize the distributions across replicates. Estimators are grouped by modeling approach and distinguished by color: Outcome-only (green), IP-only (blue), imputation+IP (orange), and IP+IO (red).
Figure 4: Evaluation of $\beta_F$ estimator performance across Setting B (Scenarios B.1--B.6). Top: Empirical bias of $\widehat{\beta}_F$ (dashed line at 0). Bottom: RMSE of $\widehat{\beta}_F$. Boxplots summarize the distributions across replicates. Estimators are grouped by modeling approach and distinguished by color: Outcome-only (green), IP-only (blue), imputation+IP (orange), and IP+IO (red).
Figure 5: Evaluation of $\beta_F$ estimator performance across Setting C (Scenarios C.1--C.6). Top: Empirical bias of $\widehat{\beta}_F$ (dashed line at 0). Bottom: RMSE of $\widehat{\beta}_F$. Boxplots summarize the distributions across replicates. Estimators are grouped by modeling approach and distinguished by color: Outcome-only (green), IP-only (blue), imputation+IP (orange), and IP+IO (red).
...and 3 more figures

Theorems & Definitions (39)

Remark 3.1
Theorem 4.1: Consistency
Theorem 4.2: Asymptotic normality
Lemma S1: NHPP order-statistics identity
proof
Remark S2.1: Independence from the frailty
Lemma S2: Martingale compensation
proof
Lemma S3: Probit-normal convolution
proof
...and 29 more

Joint Modeling of Longitudinal EHR Data with Shared Random Effects for Informative Visiting and Observation Processes

TL;DR

Abstract

Joint Modeling of Longitudinal EHR Data with Shared Random Effects for Informative Visiting and Observation Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (39)