Nonparametric Factor Analysis and Beyond

Yujia Zheng; Yang Liu; Jiaxiong Yao; Yingyao Hu; Kun Zhang

Nonparametric Factor Analysis and Beyond

Yujia Zheng, Yang Liu, Jiaxiong Yao, Yingyao Hu, Kun Zhang

TL;DR

The paper addresses latent-variable identifiability in highly general nonparametric, nonlinear, and noisy settings by leveraging a Hu–Schennach-inspired framework. It proves distribution identifiability and, under structural or distributional variability, component-wise identifiability of latent factors, even when the generating function is noninvertible and noise is non-negligible. To operationalize these insights, it introduces two estimation approaches: GEEN, a KL-divergence-based method with kernel density estimates for univariate latents, and a Regularized Autoencoder (RAE) that enforces conditional independence and likelihood-based learning for multivariate latents. Empirical validation includes simulations across continuous and discrete latents and a real-world GDP refinement experiment showing that latent GDP estimates can reveal deeper economic patterns than official measures. Collectively, the work provides a principled path from general identifiability theory to practical latent-variable estimation and meaningful real-world applications.

Abstract

Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.

Nonparametric Factor Analysis and Beyond

TL;DR

Abstract

Nonparametric Factor Analysis and Beyond

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (16)