Table of Contents
Fetching ...

Adaptive partition Factor Analysis

Elena Bortolato, Antonio Canale

TL;DR

APAFA advances multi-study factor analysis by introducing structured shrinkage priors that adaptively partition latent factors into shared and study-specific components using study-aware activation patterns. The method frames the latent structure as a neural-network–like layer with a cumulative shrinkage process to learn the number of factors, and identifies identifiability conditions to separate shared from specific contributions. Through simulations, APAFA demonstrates robust performance across diverse heterogeneity scenarios and, in real data applications to bird co-occurrence and ovarian cancer gene expression, yields richer, more interpretable insights than existing approaches. The work provides Gibbs-sampling inference, practical prior elicitation guidance, and publicly available code, offering a flexible tool for high-dimensional, multi-study analyses.

Abstract

Factor Analysis has traditionally been utilized across diverse disciplines to extrapolate latent traits that influence the behavior of multivariate observed variables. Historically, the focus has been on analyzing data from a single study, neglecting the potential study-specific variations present in data from multiple studies. Multi-study factor analysis has emerged as a recent methodological advancement that addresses this gap by distinguishing between latent traits shared across studies and study-specific components arising from artifactual or population-specific sources of variation. In this paper, we extend the current methodologies by introducing novel shrinkage priors for the latent factors, thereby accommodating a broader spectrum of scenarios -- from the absence of study-specific latent factors to models in which factors pertain only to small subgroups nested within or shared between the studies. For the proposed construction we provide conditions for identifiability of factor loadings and guidelines to perform straightforward posterior computation via Gibbs sampling. Through comprehensive simulation studies, we demonstrate that our proposed method exhibits competing performance across a variety of scenarios compared to existing methods, yet providing richer insights. The practical benefits of our approach are further illustrated through applications to bird species co-occurrence data and ovarian cancer gene expression data.

Adaptive partition Factor Analysis

TL;DR

APAFA advances multi-study factor analysis by introducing structured shrinkage priors that adaptively partition latent factors into shared and study-specific components using study-aware activation patterns. The method frames the latent structure as a neural-network–like layer with a cumulative shrinkage process to learn the number of factors, and identifies identifiability conditions to separate shared from specific contributions. Through simulations, APAFA demonstrates robust performance across diverse heterogeneity scenarios and, in real data applications to bird co-occurrence and ovarian cancer gene expression, yields richer, more interpretable insights than existing approaches. The work provides Gibbs-sampling inference, practical prior elicitation guidance, and publicly available code, offering a flexible tool for high-dimensional, multi-study analyses.

Abstract

Factor Analysis has traditionally been utilized across diverse disciplines to extrapolate latent traits that influence the behavior of multivariate observed variables. Historically, the focus has been on analyzing data from a single study, neglecting the potential study-specific variations present in data from multiple studies. Multi-study factor analysis has emerged as a recent methodological advancement that addresses this gap by distinguishing between latent traits shared across studies and study-specific components arising from artifactual or population-specific sources of variation. In this paper, we extend the current methodologies by introducing novel shrinkage priors for the latent factors, thereby accommodating a broader spectrum of scenarios -- from the absence of study-specific latent factors to models in which factors pertain only to small subgroups nested within or shared between the studies. For the proposed construction we provide conditions for identifiability of factor loadings and guidelines to perform straightforward posterior computation via Gibbs sampling. Through comprehensive simulation studies, we demonstrate that our proposed method exhibits competing performance across a variety of scenarios compared to existing methods, yet providing richer insights. The practical benefits of our approach are further illustrated through applications to bird species co-occurrence data and ovarian cancer gene expression data.

Paper Structure

This paper contains 10 sections, 5 theorems, 24 equations, 8 figures, 2 tables.

Key Result

Theorem 1

For the model defined in eq:marginalphi, if $\Psi_h\ne 1_n$ for all $h \in \{1,\ldots, k\}$ and $\Gamma$ is of full column rank $k$ with $k<p(p+1)/2$, then the model is resistant to information switching.

Figures (8)

  • Figure 1: Multi-study Factor model representation: the $n\times p$ data matrix $Y$ (on the left) is written as the product of the latent factor matrix H of dimension $n\times d$ by the factor loading matrix $\Lambda$ (the shared parts) plus the product of the latent factor matrix $\Phi$ of dimension $n\times k$ by the factor loading matrix $\Gamma$ (collecting all the study-specific parts), and a random noise $\epsilon$. Different shades of purple, red, and blue identify the $S=3$ studies in $Y$.
  • Figure 2: Neural Network representation: The input nodes are the categorical variables associated to the study structure. The first layer of latent variables are the latent study-specific factors.
  • Figure 3: Scenarios' true sparsity pattern (first four plots) and posterior estimate for a generic replicate (last four plots).
  • Figure 4: Monte Carlo distribution of the RV coefficient for the shared variance component under configuration $n<p$ (left panel) and $n>p$ (right panel).
  • Figure 5: ROC curves (left) and distribution of the AUC (right) obtained from posterior probabilities under the APAFA model over 10 independent replicated datasets, illustrating the accuracy of factor-to-unit assignments across scenarios (from top to bottom: A, A$^*$, C, D), with $n>p$.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Definition 1
  • Theorem 1
  • proof
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • proof
  • Definition 2: Non-replicable Sparsity Pattern Condition
  • Theorem 2
  • proof