Table of Contents
Fetching ...

Multivariate Functional Principal Component Analysis for Mixed-Type mHealth Data: An Application to Mood Disorders

Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov

Abstract

Modern mobile health (mHealth) assessment combines self-reported measures of participants' health experiences with passively collected health behavior data throughout the day. These data are collected across multiple measurement scales, including continuous (physical activity), truncated (pain), ordinal (mood), and binary (daily life events). When indexed by time of day and stacked across assessment domains, these data structures can be treated as multivariate functional data comprising continuous, truncated, ordinal, and binary variables. Motivated by these applications, we propose a multivariate functional principal component analysis for mixed-type data ($M^2$FPCA). The approach is based on a semiparametric Gaussian copula model and assumes that the observed data arise from an underlying multivariate generalized latent nonparanormal functional process. Latent temporal and inter-variable dependence are estimated semiparametrically through Kendall's tau bridging method. Two covariance estimation procedures are developed: a fully multivariate block-wise estimator and a computationally efficient alternative based on partial separability that assumes shared principal components across domains. The proposed method yields interpretable latent functional principal component scores that can serve as participant-specific digital biomarkers. Simulation studies demonstrate the method's competitive performance under various complex dependence structures. The method is applied to mHealth data from 307 participants in the National Institute of Mental Health Family Study of Mood and Affective Spectrum Disorders. Our approach identifies time-of-day patterns shared across mood, anxiety, energy, and physical activity that meaningfully stratify mood disorder subtypes.

Multivariate Functional Principal Component Analysis for Mixed-Type mHealth Data: An Application to Mood Disorders

Abstract

Modern mobile health (mHealth) assessment combines self-reported measures of participants' health experiences with passively collected health behavior data throughout the day. These data are collected across multiple measurement scales, including continuous (physical activity), truncated (pain), ordinal (mood), and binary (daily life events). When indexed by time of day and stacked across assessment domains, these data structures can be treated as multivariate functional data comprising continuous, truncated, ordinal, and binary variables. Motivated by these applications, we propose a multivariate functional principal component analysis for mixed-type data (FPCA). The approach is based on a semiparametric Gaussian copula model and assumes that the observed data arise from an underlying multivariate generalized latent nonparanormal functional process. Latent temporal and inter-variable dependence are estimated semiparametrically through Kendall's tau bridging method. Two covariance estimation procedures are developed: a fully multivariate block-wise estimator and a computationally efficient alternative based on partial separability that assumes shared principal components across domains. The proposed method yields interpretable latent functional principal component scores that can serve as participant-specific digital biomarkers. Simulation studies demonstrate the method's competitive performance under various complex dependence structures. The method is applied to mHealth data from 307 participants in the National Institute of Mental Health Family Study of Mood and Affective Spectrum Disorders. Our approach identifies time-of-day patterns shared across mood, anxiety, energy, and physical activity that meaningfully stratify mood disorder subtypes.
Paper Structure (29 sections, 1 theorem, 8 equations, 12 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 1 theorem, 8 equations, 12 figures, 2 tables, 2 algorithms.

Key Result

Theorem B.1

Let $X_j$ and $X_k$ be two GLNPP variables generated from an underlying latent bivariate normal vector with correlation $\rho$. Then the population Kendall’s $\tau$ satisfies $\tau_{jk}=F(\rho)$, where $F$ additionally depends on the cutoffs $\Delta_j,\Delta_k$ for non-continuous components. The bri Here $\Phi$ denotes the standard normal CDF, and $\Phi_d(\cdot;S)$ denotes the CDF of a $d$-variate

Figures (12)

  • Figure 1: Diurnal trajectories across four behavioral domains for selected participants on three days of the week. Rows correspond to domains (sad mood, anxiousness, energy, and total locomotor activity), and columns correspond to days of the week. Within each panel, colored points represent observed subject-specific weekday-averaged measurements collected at irregular times between 7:00 and 22:00, while line segments connect consecutive observed time points for each subject. Ordinal EMA domains are displayed on a common scale from 1 to 7, whereas actigraphy retains its natural scale. The figure highlights substantial between-subject heterogeneity, irregular observation patterns, and cross-domain temporal structure that motivate a joint multivariate functional modeling approach.
  • Figure 2: Data-generating mechanism under the Multivariate Generalized Latent Nonparanormal Process (MGLNPP). Cross-component dependence is induced through a latent multivariate Gaussian process. Component-wise monotone transformations produce a latent continuous process, which is mapped to observed mixed-type functional data via type-specific thresholding or truncation and observed on irregular grids.
  • Figure 3: True and the Monte-Carlo mean of the estimated covariance surface for the stationary correlation kernel and $n=500$, from the naive MFPCA (MFPCA$_N$) and the proposed $M^2$FPCA, ps-$M^2$FPCA.
  • Figure 4: Comparison of estimated correlation surfaces under the fully multivariate $M^2$FPCA and partially separable (PS) models.
  • Figure 5: Effects of latent daily features in a multinomial logistic LASSO model classifying diagnostic groups vs. controls. Positive values indicate increased log-odds of diagnosis relative to controls.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Definition 2.1: Latent multivariate Gaussian process
  • Definition 2.2: Multivariate latent nonparanormal functional process
  • Definition 2.3: Multivariate generalized latent nonparanormal process (MGLNPP)
  • Remark 2.4: Identifiability
  • Remark 2.5: Relation to latent Gaussian process models
  • Remark 2.6: Finite-dimensional implication under irregular sampling
  • Theorem B.1: Bridging functions for mixed data types