Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome
Charles J. Wolock, Susan Jacob, Julia C. Bennett, Anna Elias-Warren, Jessica O'Hanlon, Avi Kenny, Nicholas P. Jewell, Andrea Rotnitzky, Stephen R. Cole, Ana A. Weil, Helen Y. Chu, Marco Carone
TL;DR
The paper addresses estimating the time-to-symptom-resolution distribution for post-acute COVID-19 using current status data from a large university cohort. It introduces extended causal isotonic regression (extended CIR), which relaxes the uninformative-response-time assumption by allowing conditional independence between the response time and the event time within covariate strata, and it accommodates flexible machine-learning nuisance estimation and survey nonresponse. In the Husky Coronavirus Testing study, the authors estimate persistent symptoms at $30$ and $90$ days (approximately $19\%$ and $7\%$, respectively), identify risk factors for slower resolution (e.g., female sex, fatigue during acute infection, higher viral load), and perform a sensitivity analysis to gauge robustness to dependence between response and symptom resolution times. Through extensive simulations, extended CIR demonstrates favorable bias and coverage relative to traditional methods, and the work additionally discusses regression analyses with current status data via Cox models and bootstrap inference, highlighting practical guidelines for inference in presence of nonresponse.
Abstract
For infectious diseases, characterizing symptom duration is of clinical and public health importance. Symptom duration may be assessed by surveying infected individuals and querying symptom status at the time of survey response. For example, in a SARS-CoV-2 testing program at the University of Washington, participants were surveyed at least $28$ days after testing positive and asked to report current symptom status. This study design yielded current status data: outcome measurements for each respondent consisted only of the time of survey response and a binary indicator of whether symptoms had resolved by that time. Such study design benefits from limited risk of recall bias, but analyzing the resulting data necessitates tailored statistical tools. Here, we review methods for current status data and describe a novel application of modern nonparametric techniques to this setting. The proposed approach is valid under weaker assumptions compared to existing methods, allows use of flexible machine learning tools, and handles potential survey nonresponse. From the university study, under an assumption that the survey response time is conditionally independent of symptom resolution time within strata of measured covariates, we estimate that 19% of participants experienced ongoing symptoms 30 days after testing positive, decreasing to 7% at 90 days. We assess the sensitivity of these results to deviations from conditional independence, finding the estimates to be more sensitive to assumption violations at 30 days compared to 90 days. Female sex, fatigue during acute infection, and higher viral load were associated with slower symptom resolution.
