Bayesian ICA with super-Gaussian Source Priors

Jyotishka Datta; Soham Ghosh; Nicholas G. Polson

Bayesian ICA with super-Gaussian Source Priors

Jyotishka Datta, Soham Ghosh, Nicholas G. Polson

TL;DR

This work develops a fully Bayesian framework for independent component analysis with super‑Gaussian sources by introducing horseshoe‑type priors via a Polya–Gamma scale‑mixture representation. The authors unify MAP estimation and full Bayesian posterior inference through a conjugate Gibbs sampler (Gibbs‑ICE) and exactly characterize posterior contraction and a Bernstein–von Mises limit for the unmixing matrix up to signed permutations. They prove a uniform LAN expansion around the true unmixing matrix, establish parametric $N^{-1/2}$ contraction in the $d_{\pm}$ metric, and demonstrate competitive performance against leading ICA methods across several source distributions. Additional theory covers envelope optimization, auxiliary‑function EM, and connections to nonlinear ICA and flow‑based models, while simulations validate accuracy in source recovery and reconstruction under various noise regimes. The work thus provides a principled Bayesian treatment of ICA with scalable computation and solid asymptotic guarantees, with implications for semiparametric extensions and nonlinear feature extraction.

Abstract

Independent Component Analysis (ICA) plays a central role in modern machine learning as a flexible framework for feature extraction. We introduce a horseshoe-type prior with a latent Polya-Gamma scale mixture representation, yielding scalable algorithms for both point estimation via expectation-maximization (EM) and full posterior inference via Markov chain Monte Carlo (MCMC). This hierarchical formulation unifies several previously disparate estimation strategies within a single Bayesian framework. We also establish the first theoretical guarantees for hierarchical Bayesian ICA, including posterior contraction and local asymptotic normality results for the unmixing matrix. Comprehensive simulation studies demonstrate that our methods perform competitively with widely used ICA tools. We further discuss implementation of conditional posteriors, envelope-based optimization, and possible extensions to flow-based architectures for nonlinear feature extraction and deep learning. Finally, we outline several promising directions for future work.

Bayesian ICA with super-Gaussian Source Priors

TL;DR

contraction in the

metric, and demonstrate competitive performance against leading ICA methods across several source distributions. Additional theory covers envelope optimization, auxiliary‑function EM, and connections to nonlinear ICA and flow‑based models, while simulations validate accuracy in source recovery and reconstruction under various noise regimes. The work thus provides a principled Bayesian treatment of ICA with scalable computation and solid asymptotic guarantees, with implications for semiparametric extensions and nonlinear feature extraction.

Abstract

Paper Structure (40 sections, 8 theorems, 196 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 40 sections, 8 theorems, 196 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Connections with Previous Work
Hierarchical Independent Components Analysis
The generative model and the likelihood function
Exponential family representation and the posterior
Super-Gaussian Source Distributions
MacKay source distribution: hyperbolic secant
Gibbs sampling strategies
1. Update the sources $\bm S \mid \bm A,\bm T,\bm X$.
2. Update the mixing matrix $\bm A \mid \bm S,\bm X$.
3. Update the latent Pólya-Gamma scales $\bm T \mid \bm S$.
Jeffreys prior.
Student's $t$ prior fevotte2004bayesian.
Other source distributions.
Theoretical results
...and 25 more sections

Key Result

Theorem 3

Under the ICA model and assumptions (A1)--(A4) and (P1), the posterior distribution for the unmixing matrix $\bm{W}$ concentrates at the parametric rate around the true signed-permutation class of $\bm{W}_0$. For any sequence $M_N\to\infty$, where the convergence is in probability under the true data-generating process $P_0$. Moreover, the posterior distribution is asymptotically Gaussian in the

Figures (6)

Figure 1: Posterior vs. true source densities (low noise).
Figure 2: Posterior vs. true source densities, Case 2 (one spiky component and higher noise).
Figure 3: Comparison of the densities for $\hat{\bm{s}}$ and $\bm{s}$ for MacKay's algorithm and the EM algorithm under the data-generating process \ref{['eq:dgpnew']} with $\sigma=0.01$ and $\sigma_2=1$.
Figure 4: Correlations between $\hat{\bm{s}}$ and $\bm{s}$ for the two optimisation methods in the first experiment.
Figure 5: Comparison of the densities for $\hat{\bm{s}}$ and $\bm{s}$ after rescaling the first Pólya--Gamma column by a factor of $100$ and setting $\sigma=0.1$ in \ref{['eq:dgpnew']}. Both methods have difficulty in recovering the first source but perform similarly on the remaining components.
...and 1 more figures

Theorems & Definitions (10)

Remark 1
Definition 2
Theorem 3: Posterior contraction and Bernstein-von Mises theorem for ICA
Lemma 4: Uniform Local Asymptotic Normality
Lemma 5
Lemma 6: Third--order Taylor remainder
Lemma 7: Prior thickness / local flatness
Lemma 8
Lemma 9
Theorem 10: polson2015mixtures

Bayesian ICA with super-Gaussian Source Priors

TL;DR

Abstract

Bayesian ICA with super-Gaussian Source Priors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)