$σ$-PCA: a building block for neural learning of identifiable linear transformations
Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki
TL;DR
σ-PCA introduces a unified neural framework that blends linear and nonlinear PCA by explicitly incorporating a diagonal scale matrix ${\boldsymbol{\rm \Sigma}}$ into the encoder before a nonlinear act, thereby eliminating subspace rotational indeterminacy without input whitening. The model recovers both rotations of the SVD in a linear-ICA-like decomposition, enabling identifiable, dimensionality-reducing transformations beyond traditional PCA. By analyzing gradients, selecting nonlinearities, and handling mean/variance via EMA, σ-PCA demonstrates that nonlinear PCA can operate directly on unwhitened data and disentangle equal-variance components, effectively bridging PCA, nonlinear PCA, and ICA. Experiments in image patches and time-series illustrate improved disentanglement and robustness to variance degeneracy, validating σ-PCA as a practical building block for identifiable linear transformations in neural learning. Overall, the work provides a principled path to achieve identifiability without whitening, unifying core unsupervised learning approaches under a single neural objective.
Abstract
Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $σ$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.
