Table of Contents
Fetching ...

Correcting Measurement Error and Zero Inflation in Functional Covariates for Scalar-on-Function Quantile Regression

Caihong Qin, Lan Xue, Ufuk Beyaztas, Roger S. Zoh, Mark Benden, Jeff Goldsmith, Carmen D. Tekwe

TL;DR

The paper tackles measurement error and zero inflation in wearable-derived functional covariates, proposing a two-stage BE-ZIME framework to recover latent curves $\widehat{X}_i(t)$ and time-varying zero-inflation probabilities $\hat{\pi}_i(t)$, followed by a joint scalar-on-function quantile regression to quantify effects across quantiles. It introduces a latent covariate model with a subject-specific validity indicator and iteratively estimates $X_i(t)$ and $\pi_i(t)$ via basis expansions and linear mixed models, then uses the corrected $\widehat{X}_i(t)$ in a coherent, non-crossing set of quantile regressions. Across extensive simulations with constant and piecewise-constant zero inflation, BE-ZIME substantially improves estimation accuracy over methods that address only measurement error, and joint estimation yields notable gains over separate quantile fits. In a real childhood obesity study, correcting step counts for zero inflation and measurement error yields functional coefficients for BMI change that closely resemble those from energy expenditure, supporting step counts as a practical proxy when properly corrected. Overall, the approach provides a unified, scalable framework for reliable inference on time-varying behavioral covariates in health outcomes from wearable data.

Abstract

Wearable devices collect time-varying biobehavioral data, offering opportunities to investigate how behaviors influence health outcomes. However, these data often contain measurement error and excess zeros (due to nonwear, sedentary behavior, or connectivity issues), each characterized by subject-specific distributions. Current statistical methods fail to address these issues simultaneously. We introduce a novel modeling framework for zero-inflated and error-prone functional data by incorporating a subject-specific time-varying validity indicator that explicitly distinguishes structural zeros from intrinsic values. We iteratively estimate the latent functional covariates and zero-inflation probabilities via maximum likelihood, using basis expansions and linear mixed models to adjust for measurement error. To assess the effects of the recovered latent covariates, we apply joint quantile regression across multiple quantile levels. Through extensive simulations, we demonstrate that our approach significantly improves estimation accuracy over methods that only address measurement error, and joint estimation yields substantial improvements compared with fitting separate quantile regressions. Applied to a childhood obesity study, our approach effectively corrects for zero inflation and measurement error in step counts, yielding results that closely align with energy expenditure and supporting their use as a proxy for physical activity.

Correcting Measurement Error and Zero Inflation in Functional Covariates for Scalar-on-Function Quantile Regression

TL;DR

The paper tackles measurement error and zero inflation in wearable-derived functional covariates, proposing a two-stage BE-ZIME framework to recover latent curves and time-varying zero-inflation probabilities , followed by a joint scalar-on-function quantile regression to quantify effects across quantiles. It introduces a latent covariate model with a subject-specific validity indicator and iteratively estimates and via basis expansions and linear mixed models, then uses the corrected in a coherent, non-crossing set of quantile regressions. Across extensive simulations with constant and piecewise-constant zero inflation, BE-ZIME substantially improves estimation accuracy over methods that address only measurement error, and joint estimation yields notable gains over separate quantile fits. In a real childhood obesity study, correcting step counts for zero inflation and measurement error yields functional coefficients for BMI change that closely resemble those from energy expenditure, supporting step counts as a practical proxy when properly corrected. Overall, the approach provides a unified, scalable framework for reliable inference on time-varying behavioral covariates in health outcomes from wearable data.

Abstract

Wearable devices collect time-varying biobehavioral data, offering opportunities to investigate how behaviors influence health outcomes. However, these data often contain measurement error and excess zeros (due to nonwear, sedentary behavior, or connectivity issues), each characterized by subject-specific distributions. Current statistical methods fail to address these issues simultaneously. We introduce a novel modeling framework for zero-inflated and error-prone functional data by incorporating a subject-specific time-varying validity indicator that explicitly distinguishes structural zeros from intrinsic values. We iteratively estimate the latent functional covariates and zero-inflation probabilities via maximum likelihood, using basis expansions and linear mixed models to adjust for measurement error. To assess the effects of the recovered latent covariates, we apply joint quantile regression across multiple quantile levels. Through extensive simulations, we demonstrate that our approach significantly improves estimation accuracy over methods that only address measurement error, and joint estimation yields substantial improvements compared with fitting separate quantile regressions. Applied to a childhood obesity study, our approach effectively corrects for zero inflation and measurement error in step counts, yielding results that closely align with energy expenditure and supporting their use as a proxy for physical activity.
Paper Structure (11 sections, 15 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 11 sections, 15 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Minute-level step counts from one student over five different days from 8:00--14:00 in a childhood obesity study benden2014evaluation. Red points represent zero step counts.
  • Figure 2: Mean estimates of $\beta(t)=0.5\sin(\pi t)$ at $\tau=0.5$ under different zero-inflation levels, with sample size $n = 100$ and observed time points $L=100$, over 500 replications. Here, $\pi_i$ represents the zero-inflation probability of the $i$-th subject, with $\pi_i \sim \mathrm{Uniform}(\pi_0 - \pi_\delta,\, \pi_0 + \pi_\delta)$. The bold blue curve denotes the true coefficient.
  • Figure 3: Mean estimates of $\beta(t)=\sin(2\pi t)$ at $\tau=0.5$ under different zero-inflation levels, with sample size $n = 100$ and observed time points $L=100$, over 500 replications. Here, $\pi_i$ represents the zero-inflation probability of the $i$-th subject, with $\pi_i \sim \mathrm{Uniform}(\pi_0 - \pi_\delta,\, \pi_0 + \pi_\delta)$. The bold blue curve denotes the true coefficient.
  • Figure 4: Visualization of functional covariates. (a) Mean minute-level step counts and energy expenditure between 8:00 AM and 2:00 PM on separate axes, aggregated across students; (b)Mean corrected step counts estimated using four different methods over the same time period.
  • Figure 5: Analysis of zero-inflated patterns in children's step counts. (a) Histogram showing the distribution of zero-observation percentages among students, reflecting strong subject-specific heterogeneity; (b) Box plots of zero-observation percentages across hour intervals, illustrating substantial time variation in zero frequencies; (c) Mean–variance scatter plot across all time points, with each point representing one student, reveals a strong linear relationship and substantial overdispersion, supporting the presence of zero inflation. The fitted linear model is: $\text{Variance} = -53.25 + 57.02 \times \text{Mean}$.
  • ...and 1 more figures