Table of Contents
Fetching ...

Multiple Linked Tensor Factorization

Zhiyu Kang, Raghavendra B. Rao, Eric F. Lock

TL;DR

MULTIFAC tackles the challenge of integrating multi-source, multi-way data by extending CP decomposition with $L_2$ penalties on factor matrices to induce rank sparsity and automatically separate shared from dataset-specific structures. It introduces an EM-ALS framework to handle missing data patterns and a two-step cross-validation strategy for robust rank and penalty tuning. Through extensive simulations, it demonstrates superior accuracy in recovering latent signals and imputing missing entries compared to baselines, and it yields interpretable decompositions in a real multi-omics-like iron-deficiency study linking hematology and MRI data. The approach provides a practical, scalable tool for multi-tensor data integration with clear interpretation of shared versus individual signals and built-in missing-data imputation capability.

Abstract

In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets of complex biological systems. However, despite growing interest in multi-source and multi-way factorization techniques, methods that can handle data that are both multi-source and multi-way are limited. In this work, we propose a Multiple Linked Tensors Factorization (MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal. We first introduce a version of the CP factorization with L2 penalties on the latent factors, leading to rank sparsity. When extended to multiple linked tensors, the method automatically reveals latent components that are shared across data sources or individual to each data source. We also extend the decomposition algorithm to its expectation-maximization (EM) version to handle incomplete data with imputation. Extensive simulation studies are conducted to demonstrate MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared and unshared structures, and (iii) impute missing data. The approach yields an interpretable decomposition on multi-way multi-omics data for a study on early-life iron deficiency.

Multiple Linked Tensor Factorization

TL;DR

MULTIFAC tackles the challenge of integrating multi-source, multi-way data by extending CP decomposition with penalties on factor matrices to induce rank sparsity and automatically separate shared from dataset-specific structures. It introduces an EM-ALS framework to handle missing data patterns and a two-step cross-validation strategy for robust rank and penalty tuning. Through extensive simulations, it demonstrates superior accuracy in recovering latent signals and imputing missing entries compared to baselines, and it yields interpretable decompositions in a real multi-omics-like iron-deficiency study linking hematology and MRI data. The approach provides a practical, scalable tool for multi-tensor data integration with clear interpretation of shared versus individual signals and built-in missing-data imputation capability.

Abstract

In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets of complex biological systems. However, despite growing interest in multi-source and multi-way factorization techniques, methods that can handle data that are both multi-source and multi-way are limited. In this work, we propose a Multiple Linked Tensors Factorization (MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal. We first introduce a version of the CP factorization with L2 penalties on the latent factors, leading to rank sparsity. When extended to multiple linked tensors, the method automatically reveals latent components that are shared across data sources or individual to each data source. We also extend the decomposition algorithm to its expectation-maximization (EM) version to handle incomplete data with imputation. Extensive simulation studies are conducted to demonstrate MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared and unshared structures, and (iii) impute missing data. The approach yields an interpretable decomposition on multi-way multi-omics data for a study on early-life iron deficiency.

Paper Structure

This paper contains 28 sections, 5 theorems, 37 equations, 4 figures, 7 tables, 3 algorithms.

Key Result

Proposition 1

Let $\mathbf{X} = \tilde{\mathbf{A}}_1 \mathbf{D} \tilde{\mathbf{A}}_2^{\top}$ be the SVD of $\mathbf{X}$. Then, the optimal solution to problem svd_nuclear$\hat{\mathbf{X}}$ is given by where the diagonal elements of $\hat{\mathbf{D}}$ are $\hat{d}_r = \max(d_r - \sigma, 0)$, $r = 1, \ldots, R$.

Figures (4)

  • Figure 1: An illustration of MULTIFAC for two linked 3-way tensors. Subfigure (a) demonstrates how MULTIFAC decomposes tensors into shared and individual structures, where the individual components of the two tensors have distinct factor matrices $\mathbf{A}_0^{(\text{indiv, 1})}$ and $\mathbf{A}_0^{(\text{indiv, 2})}$. In practice, however, these individual structures are not explicitly modeled; instead, both tensors are assumed to share the same $\mathbf{A}_0$, as depicted in the left-hand side of subfigure (b). Through penalization, certain columns are shrunk to zero, indicated by white regions, effectively reducing the rank and leading to the emergence of individual structures.
  • Figure 2: Top Sample Loadings in Different Structures
  • Figure 3: Loading plots for the second component of in shared structure
  • Figure 4: Loading plots for the second component of the individual structure for hematology.

Theorems & Definitions (8)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • proof
  • proof