Factor Analysis with Correlated Topic Model for Multi-Modal Data
Małgorzata Łazęcka, Ewa Szczurek
TL;DR
FACTM presents a Bayesian framework that unifies factor analysis with a correlated topic model to handle both simple and structured multimodal data. By introducing a sample-specific link variable $\mu_n$ that ties the FA and CTM components and a supervised rotation mechanism, FACTM delivers interpretable latent factors and meaningful topic structures across views. Through extensive simulations and real datasets (video sentiment, music genres, and COVID-19 scRNA-seq/CT/cytometry), FACTM demonstrates superior parameter estimation, competitive predictive power, and enhanced interpretability, including biologically coherent clustering in complex data. This approach enables robust, interpretable integration of heterogeneous data modalities with practical impact on discovery and classification tasks.
Abstract
Integrating various data modalities brings valuable insights into underlying phenomena. Multimodal factor analysis (FA) uncovers shared axes of variation underlying different simple data modalities, where each sample is represented by a vector of features. However, FA is not suited for structured data modalities, such as text or single cell sequencing data, where multiple data points are measured per each sample and exhibit a clustering structure. To overcome this challenge, we introduce FACTM, a novel, multi-view and multi-structure Bayesian model that combines FA with correlated topic modeling and is optimized using variational inference. Additionally, we introduce a method for rotating latent factors to enhance interpretability with respect to binary features. On text and video benchmarks as well as real-world music and COVID-19 datasets, we demonstrate that FACTM outperforms other methods in identifying clusters in structured data, and integrating them with simple modalities via the inference of shared, interpretable factors.
