$σ$-PCA: a building block for neural learning of identifiable linear transformations

Fahdi Kanavati; Lucy Katsnith; Masayuki Tsuneki

$σ$-PCA: a building block for neural learning of identifiable linear transformations

Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki

TL;DR

σ-PCA introduces a unified neural framework that blends linear and nonlinear PCA by explicitly incorporating a diagonal scale matrix ${\boldsymbol{\rm \Sigma}}$ into the encoder before a nonlinear act, thereby eliminating subspace rotational indeterminacy without input whitening. The model recovers both rotations of the SVD in a linear-ICA-like decomposition, enabling identifiable, dimensionality-reducing transformations beyond traditional PCA. By analyzing gradients, selecting nonlinearities, and handling mean/variance via EMA, σ-PCA demonstrates that nonlinear PCA can operate directly on unwhitened data and disentangle equal-variance components, effectively bridging PCA, nonlinear PCA, and ICA. Experiments in image patches and time-series illustrate improved disentanglement and robustness to variance degeneracy, validating σ-PCA as a practical building block for identifiable linear transformations in neural learning. Overall, the work provides a principled path to achieve identifiability without whitening, unifying core unsupervised learning approaches under a single neural objective.

Abstract

Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $σ$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

$σ$-PCA: a building block for neural learning of identifiable linear transformations

TL;DR

σ-PCA introduces a unified neural framework that blends linear and nonlinear PCA by explicitly incorporating a diagonal scale matrix

into the encoder before a nonlinear act, thereby eliminating subspace rotational indeterminacy without input whitening. The model recovers both rotations of the SVD in a linear-ICA-like decomposition, enabling identifiable, dimensionality-reducing transformations beyond traditional PCA. By analyzing gradients, selecting nonlinearities, and handling mean/variance via EMA, σ-PCA demonstrates that nonlinear PCA can operate directly on unwhitened data and disentangle equal-variance components, effectively bridging PCA, nonlinear PCA, and ICA. Experiments in image patches and time-series illustrate improved disentanglement and robustness to variance degeneracy, validating σ-PCA as a practical building block for identifiable linear transformations in neural learning. Overall, the work provides a principled path to achieve identifiability without whitening, unifying core unsupervised learning approaches under a single neural objective.

Abstract

-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

Paper Structure (88 sections, 178 equations, 24 figures, 1 table)

This paper contains 88 sections, 178 equations, 24 figures, 1 table.

Introduction
Preliminaries
Learning linear transformations from data and the notion of identifiability
Singular value decomposition (SVD)
Linear PCA via SVD
Degrees of identifiability via variance and independence
Semi-orthogonal transformation
Non-orthogonal transformation
Conventional neural linear PCA
Conventional neural nonlinear PCA
sigma-PCA: a unified neural model for linear and nonlinear PCA
A closer look at the gradient
Nonlinear PCA: an ICA method that maximizes both variance and independence
Relationship between linear PCA, nonlinear PCA, and linear ICA
Choice of nonlinearity for sub- and super-Gaussian input distributions
...and 73 more sections

Figures (24)

Figure 1: Identifiability of rotations using variance or independence maximization. The aim is to find a rotation that transforms $\mathbf{x}$ into $\mathbf{y}$, aligning the axes. Red indicates unaligned; green, aligned. For the rectangle and the ellipse, representing distinct variances, we can figure out a rotation, except that we do not know if we should flip up or down, left or right -- a sign indeterminacy. For the square and the circle, representing equal variance, we cannot figure out a rotation only from the variance. For the square, however, we can use independence to figure out a rotation, except that we do not know if we should flip and/or permute the vertical and horizontal axes -- sign and permutational indeterminacies. For the circle, nothing can be done. There are no favoured directions; even if we apply a rotation, it would remain the same -- a rotational indeterminacy. A Gaussian is like a circle.
Figure 2: Linear PCA, nonlinear PCA, and linear ICA. Green indicates alignment of axes.
Figure 3: Three distributions with unit variance (a): uniform distribution (sub-Gaussian), Gaussian distribution, and Laplace distribution (super-Gaussian). Shaded in grey is any $|x| \geq 2\sigma$. When using $\tanh$ without any scale adjustment, we can see that for the super-Gaussian distribution (or for any heavy-tailed distribution), values beyond $2\sigma$ might have their reconstruction impaired because of the squashing by $\tanh$ (b). A remedy, in this case, would be to use $a\tanh(x/a)$ with $a \geq 1$. For a sub-Gaussian distribution, it is more suited to use $a \leq 1$ as the values are within $2\sigma$.
Figure 4: A set of 32 11x11px filters obtained on patches from the CIFAR-10 dataset. We obtained similar filters with the proposed unit-norm-preserving linear PCA loss (b) as the ones obtained via SVD (a). The filters obtained by nonlinear PCA (c, d) seem to have further separated the mixed filters from linear PCA. In particular, this is obvious by looking at the vertical and horizontal line filters at different positions that have been unmixed with nonlinear PCA. We see that setting $a = 1$ is not enough to separate (c), requiring a large $a$ such as $a = 4$ (d). We obtained meaningless filters in the PCA subspace with the symmetric linear PCA loss (e). We also obtained meaningless filters when the contribution of the decoder was included (f). This is similarly the case with conventional nonlinear PCA without whitening (g). FastICA (h) relaxes the orthogonality assumption of the overall transformation, so we see filters that are not necessarily orthogonal; however, there is some overlap with nonlinear PCA filters.
Figure 5: Three signals (sinusoidal, square, and sawtooth) that were mixed with an orthogonal mixing matrix. Linear PCA separated the sinusoidal signal as it had a distinct variance, but did not separate the square and the sawtooth signals as they had the same variance. Nonlinear PCA separated the signals and recovered their variances.
...and 19 more figures

$σ$-PCA: a building block for neural learning of identifiable linear transformations

TL;DR

Abstract

$σ$-PCA: a building block for neural learning of identifiable linear transformations

Authors

TL;DR

Abstract

Table of Contents

Figures (24)