Correcting Measurement Error and Zero Inflation in Functional Covariates for Scalar-on-Function Quantile Regression
Caihong Qin, Lan Xue, Ufuk Beyaztas, Roger S. Zoh, Mark Benden, Jeff Goldsmith, Carmen D. Tekwe
TL;DR
The paper tackles measurement error and zero inflation in wearable-derived functional covariates, proposing a two-stage BE-ZIME framework to recover latent curves $\widehat{X}_i(t)$ and time-varying zero-inflation probabilities $\hat{\pi}_i(t)$, followed by a joint scalar-on-function quantile regression to quantify effects across quantiles. It introduces a latent covariate model with a subject-specific validity indicator and iteratively estimates $X_i(t)$ and $\pi_i(t)$ via basis expansions and linear mixed models, then uses the corrected $\widehat{X}_i(t)$ in a coherent, non-crossing set of quantile regressions. Across extensive simulations with constant and piecewise-constant zero inflation, BE-ZIME substantially improves estimation accuracy over methods that address only measurement error, and joint estimation yields notable gains over separate quantile fits. In a real childhood obesity study, correcting step counts for zero inflation and measurement error yields functional coefficients for BMI change that closely resemble those from energy expenditure, supporting step counts as a practical proxy when properly corrected. Overall, the approach provides a unified, scalable framework for reliable inference on time-varying behavioral covariates in health outcomes from wearable data.
Abstract
Wearable devices collect time-varying biobehavioral data, offering opportunities to investigate how behaviors influence health outcomes. However, these data often contain measurement error and excess zeros (due to nonwear, sedentary behavior, or connectivity issues), each characterized by subject-specific distributions. Current statistical methods fail to address these issues simultaneously. We introduce a novel modeling framework for zero-inflated and error-prone functional data by incorporating a subject-specific time-varying validity indicator that explicitly distinguishes structural zeros from intrinsic values. We iteratively estimate the latent functional covariates and zero-inflation probabilities via maximum likelihood, using basis expansions and linear mixed models to adjust for measurement error. To assess the effects of the recovered latent covariates, we apply joint quantile regression across multiple quantile levels. Through extensive simulations, we demonstrate that our approach significantly improves estimation accuracy over methods that only address measurement error, and joint estimation yields substantial improvements compared with fitting separate quantile regressions. Applied to a childhood obesity study, our approach effectively corrects for zero inflation and measurement error in step counts, yielding results that closely align with energy expenditure and supporting their use as a proxy for physical activity.
