Table of Contents
Fetching ...

Functional Singular Value Decomposition

Jianbin Tan, Pixu Shi, Anru R. Zhang

TL;DR

FSVD introduces a unified, low-rank framework for heterogeneous functional data, defining [X_1,...,X_n]^T= \sum_r \rho_r \bm{a}_r \phi_r with orthonormal components and establishing existence plus basic properties. It extends FPCA and factor-model viewpoints to irregularly observed and non-identically distributed data via an RKHS-based rank-one kernel ridge regression and an alternating minimization algorithm, with rigorous convergence guarantees for the first component. By introducing intrinsic basis functions and intrinsic basis vectors, FSVD captures both functional and tabular heterogeneity, enabling functional completion, clustering, and regression without covariance estimation of the data. Through simulations and real data (COVID-19 trajectories and ICU EHRs), FSVD shows superior performance in pattern discovery, missing data completion, and predictive tasks, offering a flexible toolkit for a wide range of functional-data analyses.

Abstract

Heterogeneous functional data commonly arise in time series and longitudinal studies. To uncover the statistical structures of such data, we propose Functional Singular Value Decomposition (FSVD), a unified framework encompassing various tasks for the analysis of functional data with potential heterogeneity. We establish the mathematical foundation of FSVD by proving its existence and providing its fundamental properties. We then develop an implementation approach for noisy and irregularly observed functional data based on a novel alternating minimization scheme and provide theoretical guarantees for its convergence and estimation accuracy. The FSVD framework also introduces the concepts of intrinsic basis functions and intrinsic basis vectors, representing two fundamental structural aspects of random functions. These concepts enable FSVD to provide new and improved solutions to tasks including functional principal component analysis, factor models, functional clustering, functional linear regression, and functional completion, while effectively handling heterogeneity and irregular temporal sampling. Through extensive simulations, we demonstrate that FSVD-based methods consistently outperform existing methods across these tasks. To showcase the value of FSVD in real-world datasets, we apply it to extract temporal patterns from a COVID-19 case count dataset and perform data completion on an electronic health record dataset.

Functional Singular Value Decomposition

TL;DR

FSVD introduces a unified, low-rank framework for heterogeneous functional data, defining [X_1,...,X_n]^T= \sum_r \rho_r \bm{a}_r \phi_r with orthonormal components and establishing existence plus basic properties. It extends FPCA and factor-model viewpoints to irregularly observed and non-identically distributed data via an RKHS-based rank-one kernel ridge regression and an alternating minimization algorithm, with rigorous convergence guarantees for the first component. By introducing intrinsic basis functions and intrinsic basis vectors, FSVD captures both functional and tabular heterogeneity, enabling functional completion, clustering, and regression without covariance estimation of the data. Through simulations and real data (COVID-19 trajectories and ICU EHRs), FSVD shows superior performance in pattern discovery, missing data completion, and predictive tasks, offering a flexible toolkit for a wide range of functional-data analyses.

Abstract

Heterogeneous functional data commonly arise in time series and longitudinal studies. To uncover the statistical structures of such data, we propose Functional Singular Value Decomposition (FSVD), a unified framework encompassing various tasks for the analysis of functional data with potential heterogeneity. We establish the mathematical foundation of FSVD by proving its existence and providing its fundamental properties. We then develop an implementation approach for noisy and irregularly observed functional data based on a novel alternating minimization scheme and provide theoretical guarantees for its convergence and estimation accuracy. The FSVD framework also introduces the concepts of intrinsic basis functions and intrinsic basis vectors, representing two fundamental structural aspects of random functions. These concepts enable FSVD to provide new and improved solutions to tasks including functional principal component analysis, factor models, functional clustering, functional linear regression, and functional completion, while effectively handling heterogeneity and irregular temporal sampling. Through extensive simulations, we demonstrate that FSVD-based methods consistently outperform existing methods across these tasks. To showcase the value of FSVD in real-world datasets, we apply it to extract temporal patterns from a COVID-19 case count dataset and perform data completion on an electronic health record dataset.
Paper Structure (46 sections, 20 theorems, 250 equations, 6 figures, 2 tables, 5 algorithms)

This paper contains 46 sections, 20 theorems, 250 equations, 6 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

Suppose $X_1,\ldots, X_n \in \mathcal{H}$. Then there exists an FSVD of $X_1,\ldots, X_n$: where $\rho_1 \geq \dots\geq \rho_R >0$ are singular values, $\bm{a}_1,\dots,\bm{a}_R\in \mathbb{R}^n$ are singular vectors, $\phi_1,\dots,\phi_R\in \mathcal{H}$ are singular functions, and $R\leq n$ is the rank. Here, $\bm{a}_1,\dots,\bm{a}_R$ and $\phi_1,\dots,\phi_R$ are orthonormal in the sense

Figures (6)

  • Figure 1: A pictorial illustration of FSVD: images on the horizontal ($x$-$y$) plane represent the FSVD of irregularly observed functional data, while the curves along the vertical ($z$) axis illustrate the smooth nature of functional data.
  • Figure 2: An illustration of tasks associated with FSVD.
  • Figure 3: (A): The $\text{NMSE}_{X}$ of functional completion for different methods with sample sizes $n$ (main title) and numbers of time points $J_i$ (subtitle). (B): Box-plots of ARI of functional clustering for different methods with sample sizes $n$ and numbers of time points $J_i$. (C): Functional coefficients of functional regression estimated from different methods with different numbers of time points $J_i$. The solid and dotted lines indicate the true functional coefficients and the point-wise means of the estimated functional coefficients from simulation, respectively. The shaded regions represent the 95% point-wise interval calculated from simulation. (D): The $\text{NMSE}_{A}$ of factor model loadings for different methods with sample sizes $n$ and numbers of time points $J_i$.
  • Figure 4: (A): Irregularly observed data across different regions; (B): estimated intrinsic basis functions (IBFs) from FSVD; (C): estimated mean function after normalization (MF) and estimated eigenfunctions (EFs) from FPCA; (D): Clustering map for the dynamics from different regions; (E): Estimated mean functions of two clusters.
  • Figure 5: (A) Longitudinal data for 12 clinical features from a patient and their functional completion by FSVD. (B) The estimated factor series and (C) the corresponding factor loadings for the electronic health record from a patient.
  • ...and 1 more figures

Theorems & Definitions (44)

  • Theorem 1: Existence and Basic Properties of Functional Singular Value Decomposition
  • Theorem 2: Sequential Formation of FSVD
  • Remark 1: Connections to existing functional data/kernel ridge regression/SVD methods
  • Theorem 3
  • Theorem 4
  • Definition 1: Intrinsic Basis Functions
  • Theorem 5
  • Corollary 1
  • Remark 2: Comparison of Intrinsic Basis Functions, FSVD, and FPCA and Separability
  • Definition 2: Intrinsic Basis Vectors
  • ...and 34 more