Multiple Linked Tensor Factorization

Zhiyu Kang; Raghavendra B. Rao; Eric F. Lock

Multiple Linked Tensor Factorization

Zhiyu Kang, Raghavendra B. Rao, Eric F. Lock

TL;DR

MULTIFAC tackles the challenge of integrating multi-source, multi-way data by extending CP decomposition with $L_2$ penalties on factor matrices to induce rank sparsity and automatically separate shared from dataset-specific structures. It introduces an EM-ALS framework to handle missing data patterns and a two-step cross-validation strategy for robust rank and penalty tuning. Through extensive simulations, it demonstrates superior accuracy in recovering latent signals and imputing missing entries compared to baselines, and it yields interpretable decompositions in a real multi-omics-like iron-deficiency study linking hematology and MRI data. The approach provides a practical, scalable tool for multi-tensor data integration with clear interpretation of shared versus individual signals and built-in missing-data imputation capability.

Abstract

In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets of complex biological systems. However, despite growing interest in multi-source and multi-way factorization techniques, methods that can handle data that are both multi-source and multi-way are limited. In this work, we propose a Multiple Linked Tensors Factorization (MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal. We first introduce a version of the CP factorization with L2 penalties on the latent factors, leading to rank sparsity. When extended to multiple linked tensors, the method automatically reveals latent components that are shared across data sources or individual to each data source. We also extend the decomposition algorithm to its expectation-maximization (EM) version to handle incomplete data with imputation. Extensive simulation studies are conducted to demonstrate MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared and unshared structures, and (iii) impute missing data. The approach yields an interpretable decomposition on multi-way multi-omics data for a study on early-life iron deficiency.

Multiple Linked Tensor Factorization

TL;DR

Abstract

Multiple Linked Tensor Factorization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)