Adaptive partition Factor Analysis
Elena Bortolato, Antonio Canale
TL;DR
APAFA advances multi-study factor analysis by introducing structured shrinkage priors that adaptively partition latent factors into shared and study-specific components using study-aware activation patterns. The method frames the latent structure as a neural-network–like layer with a cumulative shrinkage process to learn the number of factors, and identifies identifiability conditions to separate shared from specific contributions. Through simulations, APAFA demonstrates robust performance across diverse heterogeneity scenarios and, in real data applications to bird co-occurrence and ovarian cancer gene expression, yields richer, more interpretable insights than existing approaches. The work provides Gibbs-sampling inference, practical prior elicitation guidance, and publicly available code, offering a flexible tool for high-dimensional, multi-study analyses.
Abstract
Factor Analysis has traditionally been utilized across diverse disciplines to extrapolate latent traits that influence the behavior of multivariate observed variables. Historically, the focus has been on analyzing data from a single study, neglecting the potential study-specific variations present in data from multiple studies. Multi-study factor analysis has emerged as a recent methodological advancement that addresses this gap by distinguishing between latent traits shared across studies and study-specific components arising from artifactual or population-specific sources of variation. In this paper, we extend the current methodologies by introducing novel shrinkage priors for the latent factors, thereby accommodating a broader spectrum of scenarios -- from the absence of study-specific latent factors to models in which factors pertain only to small subgroups nested within or shared between the studies. For the proposed construction we provide conditions for identifiability of factor loadings and guidelines to perform straightforward posterior computation via Gibbs sampling. Through comprehensive simulation studies, we demonstrate that our proposed method exhibits competing performance across a variety of scenarios compared to existing methods, yet providing richer insights. The practical benefits of our approach are further illustrated through applications to bird species co-occurrence data and ovarian cancer gene expression data.
