Table of Contents
Fetching ...

Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components

Hervé Cardot, Caroline Peltier

TL;DR

This work introduces a Hilbert-space–based framework to describe continuous-time categorical trajectories by associating each state with a binary indicator process and applying multivariate functional principal components analysis (MFPCA). By treating the vector of binary processes $\mathbf{X}(t)=(X_1(t),\dots,X_q(t))$ within $\mathbb{H}=L^2([0,T],\mathbb{R}^q)$, the authors derive mean and cross-covariance structures, establish consistency of estimators, and obtain a tractable additive Karhunen–Loève expansion that yields interpretable principal components. The methodology is illustrated on sensory data (TDS/TCATA) showing that a small number of components capture the main variation and differentiate stimuli, with comparisons to continuous-time correspondence analysis (CFDA) highlighting advantages in interpretability and robustness to zero-probability states. The approach enables simple, interpretable dimension reduction for individual trajectories and supports predictive analyses using principal component scores. The work provides theoretical guarantees, practical estimators, and open-source code for applying MFPCA to categorical trajectories in various domains.

Abstract

Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in $\{0,1\}$, and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. Under the weak hypothesis assuming only continuity in probability of the $0-1$ trajectories, the means and the (multivariate) covariance function are continuous and have interpretations in terms of departure from independence of the joint probabilities. Considering a functional data point of view, we show that the binary trajectories, which are right-continuous functions with left-hand limits, can be seen as random elements in the Hilbert space of square integrable functions. The multivariate functional principal components are simple to interpret and we show that we can get consistent estimators of the mean trajectories and the covariance functions under weak regularity assumptions. The ability of the approach to represent categorical trajectories in a small dimension space is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments.

Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components

TL;DR

This work introduces a Hilbert-space–based framework to describe continuous-time categorical trajectories by associating each state with a binary indicator process and applying multivariate functional principal components analysis (MFPCA). By treating the vector of binary processes within , the authors derive mean and cross-covariance structures, establish consistency of estimators, and obtain a tractable additive Karhunen–Loève expansion that yields interpretable principal components. The methodology is illustrated on sensory data (TDS/TCATA) showing that a small number of components capture the main variation and differentiate stimuli, with comparisons to continuous-time correspondence analysis (CFDA) highlighting advantages in interpretability and robustness to zero-probability states. The approach enables simple, interpretable dimension reduction for individual trajectories and supports predictive analyses using principal component scores. The work provides theoretical guarantees, practical estimators, and open-source code for applying MFPCA to categorical trajectories in various domains.

Abstract

Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in , and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. Under the weak hypothesis assuming only continuity in probability of the trajectories, the means and the (multivariate) covariance function are continuous and have interpretations in terms of departure from independence of the joint probabilities. Considering a functional data point of view, we show that the binary trajectories, which are right-continuous functions with left-hand limits, can be seen as random elements in the Hilbert space of square integrable functions. The multivariate functional principal components are simple to interpret and we show that we can get consistent estimators of the mean trajectories and the covariance functions under weak regularity assumptions. The ability of the approach to represent categorical trajectories in a small dimension space is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments.

Paper Structure

This paper contains 17 sections, 6 theorems, 42 equations, 22 figures, 4 tables.

Key Result

Proposition 2.1

Under hypothesis $\mathbf{H}_1$, we have for all $j \in \{1, \ldots, q\}$,

Figures (22)

  • Figure 1: TDS bandplot for $n=150$ tasting experiments and $q=8$ states, considering three different gustometer-controlled stimuli, S06, S07 and S04, extracted from the open data basis BNV2023. Each row corresponds to a categorical trajectory.
  • Figure 2: Three gustometer-controlled stimuli (S06, S07 and S04) extracted from the open data basis BNV2023.
  • Figure 3: Empirical probabilities $\widehat{p}_j(t)$, $t \in [0,1]$. The curves are drawn only for the states $j$ whose average probability of occurrence is larger than 5%, that is to say $\int_0^1 \widehat{p}_j (t) dt \geq 0.05$.
  • Figure 4: Proportion of total variance captured by the principal components in $\mathbb{H}$.
  • Figure 5: Estimated principal component scores. Different gray levels are used to distinguish the observations according to the set of gustomer-controlled stimuli.
  • ...and 17 more figures

Theorems & Definitions (15)

  • Proposition 2.1
  • proof
  • Remark 1
  • Remark 2
  • Definition 1
  • Proposition 2.2
  • proof
  • Proposition 2.3
  • proof
  • Proposition 3.1
  • ...and 5 more