Identifiability of overcomplete independent component analysis

Kexin Wang; Anna Seigal

Identifiability of overcomplete independent component analysis

Kexin Wang, Anna Seigal

TL;DR

The identifiability of ICA is generalized to the overcomplete setting, where the number of sources exceeds the number of observations, and an if and only if characterization of the identifiability of overcomplete ICA is given.

Abstract

Independent component analysis (ICA) studies mixtures of independent latent sources. An ICA model is identifiable if the mixing can be recovered uniquely. It is well-known that ICA is identifiable if and only if at most one source is Gaussian. However, this applies only to the setting where the number of sources is at most the number of observations. In this paper, we generalize the identifiability of ICA to the overcomplete setting, where the number of sources exceeds the number of observations. We give an if and only if characterization of the identifiability of overcomplete ICA. The proof studies linear spaces of rank one symmetric matrices. For generic mixing, we present an identifiability condition in terms of the number of sources and the number of observations. We use our identifiability results to design an algorithm to recover the mixing matrix from data and apply it to synthetic data and two real datasets.

Identifiability of overcomplete independent component analysis

TL;DR

Abstract

Paper Structure (20 sections, 34 theorems, 35 equations, 7 figures, 1 algorithm)

This paper contains 20 sections, 34 theorems, 35 equations, 7 figures, 1 algorithm.

Introduction
Characterization of identifiability
Sufficiency
Necessity
From identifiability to systems of quadrics
Systems of quadrics
Complex solutions to a system of quadrics
Real solutions to a system of quadrics
From complex to real identifiability
The projected second Veronese
Generic identifiability
Identifiable and non-identifiable matrices
Large identifiable matrices
Low-rank identifiable matrices
Non-identifiable matrices
...and 5 more sections

Key Result

Theorem 1.1

Consider the ICA model $\mathbf{x}=A\mathbf{s}$, where $\mathbf{s} = (s_1,\ldots,s_I)^\mathsf{T}$ is a vector of non-degenerate independent sources, $\mathbf{x}=(x_1,\ldots,x_I)^\mathsf{T}$ is a vector of observations, and $A \in \mathbb{R}^{I \times I}$ is invertible. Identifiability holds if and o

Figures (7)

Figure 1: Illustration of Theorem \ref{['thm: iff for identifiablity']}
Figure 2: Relative Frobenius error using population cumulants. We fix the fourth cumulant of the non-Gaussian sources to be $6$, the second cumulant to be $1$, and consider a standard Gaussian as the Gaussian source. We run 1000 experiments on each pair $(I,J)$ and plot the mean relative Frobenius error. The black dashed lines are the identifiability thresholds from Theorem \ref{['thm: real generic matrix result']}: ${I\choose 2}+1$ for $I=6,7$ and ${I\choose 2}$ for $I=8,9$. The errors are low for $J$ below the threshold and increase beyond it. The small increase in error from $J={I\choose 2}$ to ${I\choose 2}+1$ for $I=6,7$ is due to the positive probability of non-identifiability when $J = {I \choose 2}+1$, see Theorem \ref{['thm: real generic matrix result']}.
Figure 3: Relative Frobenius error for differing Gaussian source variance. We consider variances in the range $\{ 0.01, 0.1, 1, 10, 100 \}$. We fix $I=6$. The black dashed lines are the threshold $J={I\choose 2}+1=16$. For each matrix size and variance, we run the experiment 1000 times and plot the mean. As the variance of the Gaussian source increases, the relative Frobneius error decreases. In the left figure, we use $1000(I+J)$ iterations in Powell's method. On the right, we increase the number of iterations to $500000$, which makes the algorithm more stable to change of variance.
Figure 4: Relative Frobenius error with sample cumulant tensors. We take our non-Gaussian sources to be exponential sources with parameter 1 (left) and Student $t$-distributed sources with five degrees of freedom (right). We set the Gaussian source to be a standard Gaussian. We fix $I = 6$. For each pair $(I,J)$, we run 1000 experiments and plot the mean Frobneius error. In both plots, the error decreases as the sample size increases. We plot the population cumulant method (labelled as 'inf') for comparison.
Figure 5: We divide the $7 \times 7$ images in half to give two datasets, each with $400000 = \frac{1}{2} (16 \times 50000)$ datapoints of dimension $49$, keeping the number of images from each class roughly the same between the two halves. We apply Algorithm \ref{['alg:recover A']} to the two datasets and assess the similarity of the output. We obtain two matrices $A \in \mathbb{R}^{49\times J}$, where $J$ is the number of sources. We illustrate the results for $J=114$. The columns of the two $49\times 114$ matrices are plotted as grayscale $7 \times 7$ images. We observe the visual agreement of the $114$ images, reflecting the identifiability. The last image is the Gaussian source, the Gaussian noise in the images. The two Gaussian sources have cosine similarity $0.99$. Their grayscale plots show that pixels patterns have more Gaussian noise at the center than the edges.
...and 2 more figures

Theorems & Definitions (71)

Theorem 1.1: comon1994independent
Theorem 1.2: The Darmois–Skitovich theorem darmois1953analyseskitovitch1953propertyskitovivc1962linear
Definition 1.3
Remark 1.4: Recovering mixing vs. sources
Theorem 1.5
Example 1.6
Example 1.7
Example 1.8
Theorem 1.9
Proposition 2.1
...and 61 more

Identifiability of overcomplete independent component analysis

TL;DR

Abstract

Identifiability of overcomplete independent component analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (71)