Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

Charles J. Wolock; Susan Jacob; Julia C. Bennett; Anna Elias-Warren; Jessica O'Hanlon; Avi Kenny; Nicholas P. Jewell; Andrea Rotnitzky; Stephen R. Cole; Ana A. Weil; Helen Y. Chu; Marco Carone

Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

Charles J. Wolock, Susan Jacob, Julia C. Bennett, Anna Elias-Warren, Jessica O'Hanlon, Avi Kenny, Nicholas P. Jewell, Andrea Rotnitzky, Stephen R. Cole, Ana A. Weil, Helen Y. Chu, Marco Carone

TL;DR

The paper addresses estimating the time-to-symptom-resolution distribution for post-acute COVID-19 using current status data from a large university cohort. It introduces extended causal isotonic regression (extended CIR), which relaxes the uninformative-response-time assumption by allowing conditional independence between the response time and the event time within covariate strata, and it accommodates flexible machine-learning nuisance estimation and survey nonresponse. In the Husky Coronavirus Testing study, the authors estimate persistent symptoms at $30$ and $90$ days (approximately $19\%$ and $7\%$, respectively), identify risk factors for slower resolution (e.g., female sex, fatigue during acute infection, higher viral load), and perform a sensitivity analysis to gauge robustness to dependence between response and symptom resolution times. Through extensive simulations, extended CIR demonstrates favorable bias and coverage relative to traditional methods, and the work additionally discusses regression analyses with current status data via Cox models and bootstrap inference, highlighting practical guidelines for inference in presence of nonresponse.

Abstract

For infectious diseases, characterizing symptom duration is of clinical and public health importance. Symptom duration may be assessed by surveying infected individuals and querying symptom status at the time of survey response. For example, in a SARS-CoV-2 testing program at the University of Washington, participants were surveyed at least $28$ days after testing positive and asked to report current symptom status. This study design yielded current status data: outcome measurements for each respondent consisted only of the time of survey response and a binary indicator of whether symptoms had resolved by that time. Such study design benefits from limited risk of recall bias, but analyzing the resulting data necessitates tailored statistical tools. Here, we review methods for current status data and describe a novel application of modern nonparametric techniques to this setting. The proposed approach is valid under weaker assumptions compared to existing methods, allows use of flexible machine learning tools, and handles potential survey nonresponse. From the university study, under an assumption that the survey response time is conditionally independent of symptom resolution time within strata of measured covariates, we estimate that 19% of participants experienced ongoing symptoms 30 days after testing positive, decreasing to 7% at 90 days. We assess the sensitivity of these results to deviations from conditional independence, finding the estimates to be more sensitive to assumption violations at 30 days compared to 90 days. Female sex, fatigue during acute infection, and higher viral load were associated with slower symptom resolution.

Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

TL;DR

and

days (approximately

and

, respectively), identify risk factors for slower resolution (e.g., female sex, fatigue during acute infection, higher viral load), and perform a sensitivity analysis to gauge robustness to dependence between response and symptom resolution times. Through extensive simulations, extended CIR demonstrates favorable bias and coverage relative to traditional methods, and the work additionally discusses regression analyses with current status data via Cox models and bootstrap inference, highlighting practical guidelines for inference in presence of nonresponse.

Abstract

days after testing positive and asked to report current symptom status. This study design yielded current status data: outcome measurements for each respondent consisted only of the time of survey response and a binary indicator of whether symptoms had resolved by that time. Such study design benefits from limited risk of recall bias, but analyzing the resulting data necessitates tailored statistical tools. Here, we review methods for current status data and describe a novel application of modern nonparametric techniques to this setting. The proposed approach is valid under weaker assumptions compared to existing methods, allows use of flexible machine learning tools, and handles potential survey nonresponse. From the university study, under an assumption that the survey response time is conditionally independent of symptom resolution time within strata of measured covariates, we estimate that 19% of participants experienced ongoing symptoms 30 days after testing positive, decreasing to 7% at 90 days. We assess the sensitivity of these results to deviations from conditional independence, finding the estimates to be more sensitive to assumption violations at 30 days compared to 90 days. Female sex, fatigue during acute infection, and higher viral load were associated with slower symptom resolution.

Paper Structure (14 sections, 19 equations, 16 figures, 2 tables)

This paper contains 14 sections, 19 equations, 16 figures, 2 tables.

Causal isotonic regression for current status data
Data structure and target of estimation
Estimation and inference
Sensitivity analysis
Regression analysis
Simulation studies
Simulations comparing extended CIR to alternative methods
Robustness and stability simulations
Sensitivity analysis simulations
Cox proportional hazards regressions simulations
Supplementary information for HCT data analysis
Eligibility criteria for HCT enrollment
Algorithm library for Super Learner
Supplementary figures

Figures (16)

Figure 1: Standardized survival curve for time from positive test until symptom resolution, estimated using the proposed extended causal isotonic regression method (top panel), along with the distribution of survey response times between $28$ and $95$ days (bottom panel). The solid black line shows the estimated proportion with ongoing symptoms at times from $30$ days to $90$ days after the positive test. The black points show the estimated proportion with ongoing symptoms at $30$, $60$, and $90$ days. The dashed lines represent a pointwise 95% confidence interval. The gray vertical lines denote $30$, $60$, and $90$ days.
Figure 2: Sensitivity analysis for deviation from conditionally uninformative response time assumption. From top to bottom, the rows correspond to the estimated proportion with ongoing symptoms at $30$, $60$, and $90$ days after the positive test. The $x$-axis represents the value of Kendall's rank correlation coefficient $\tau$ used to encode conditional dependence in each sensitivity analysis. At $\tau = 0$, in black, we give the estimate and confidence interval obtained from the proposed extended CIR procedure assuming conditional independence (primary analysis). Gray crosses represent estimates corresponding to different values of $\tau$.
Figure S1: Pointwise bias of procedures for estimating the survival function under current status sampling with nonresponse. From left to right, the columns represent nonresponse Scenarios 1, 2 and 3, ordered from smallest to largest degree of nonresponse. From top to bottom, the rows denote the standard CIR method using only complete cases, the extended CIR method accounting for nonresponse, and the NPMLE. The $x$-axis displays the true value of the distribution function. The black line denotes zero bias.
Figure S2: Integrated absolute bias of procedures for estimating the survival function under current status sampling with nonresponse. From left to right, the columns represent nonresponse Scenarios 1, 2 and 3, ordered from lowest to highest nonresponse level. The three lines in color denote the standard CIR method using only complete cases, the extended CIR method accounting for nonresponse, and the NPMLE. The $x$-axis corresponds to sample size. The black line denotes zero bias.
Figure S3: Confidence interval coverage using the CIR procedure for estimating the survival function under current status sampling with nonresponse. From left to right, the columns represent nonresponse Scenarios 1, 2 and 3, ordered from smallest to largest degree of nonresponse. From top to bottom, the rows denote the standard CIR method using only complete cases, the extended CIR method accounting for nonresponse, and the NPMLE. The $x$-axis displays the true value of the distribution function. The black line denotes the nominal 95% coverage level.
...and 11 more figures

Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

TL;DR

Abstract

Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

Authors

TL;DR

Abstract

Table of Contents

Figures (16)