Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components
Hervé Cardot, Caroline Peltier
TL;DR
This work introduces a Hilbert-space–based framework to describe continuous-time categorical trajectories by associating each state with a binary indicator process and applying multivariate functional principal components analysis (MFPCA). By treating the vector of binary processes $\mathbf{X}(t)=(X_1(t),\dots,X_q(t))$ within $\mathbb{H}=L^2([0,T],\mathbb{R}^q)$, the authors derive mean and cross-covariance structures, establish consistency of estimators, and obtain a tractable additive Karhunen–Loève expansion that yields interpretable principal components. The methodology is illustrated on sensory data (TDS/TCATA) showing that a small number of components capture the main variation and differentiate stimuli, with comparisons to continuous-time correspondence analysis (CFDA) highlighting advantages in interpretability and robustness to zero-probability states. The approach enables simple, interpretable dimension reduction for individual trajectories and supports predictive analyses using principal component scores. The work provides theoretical guarantees, practical estimators, and open-source code for applying MFPCA to categorical trajectories in various domains.
Abstract
Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in $\{0,1\}$, and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. Under the weak hypothesis assuming only continuity in probability of the $0-1$ trajectories, the means and the (multivariate) covariance function are continuous and have interpretations in terms of departure from independence of the joint probabilities. Considering a functional data point of view, we show that the binary trajectories, which are right-continuous functions with left-hand limits, can be seen as random elements in the Hilbert space of square integrable functions. The multivariate functional principal components are simple to interpret and we show that we can get consistent estimators of the mean trajectories and the covariance functions under weak regularity assumptions. The ability of the approach to represent categorical trajectories in a small dimension space is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments.
