A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis

Bo Hu; Jose C Principe

A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis

Bo Hu, Jose C Principe

Abstract

Statistical dependence measures like mutual information is ideal for analyzing autoencoders, but it can be ill-posed for deterministic, static, noise-free networks. We adopt the variational (Gaussian) formulation that makes dependence among inputs, latents, and reconstructions measurable, and we propose a stable neural dependence estimator based on an orthonormal density-ratio decomposition. Unlike MINE, our method avoids input concatenation and product-of-marginals re-pairing, reducing computational cost and improving stability. We introduce an efficient NMF-like scalar objective and demonstrate empirically that assuming Gaussian noise to form an auxiliary variable enables meaningful dependence measurements and supports quantitative feature analysis, with a sequential convergence of singular values.

A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis

Abstract

Paper Structure (28 sections, 39 equations, 30 figures, 11 tables, 2 algorithms)

This paper contains 28 sections, 39 equations, 30 figures, 11 tables, 2 algorithms.

Introduction
Orthonormal Decomposition of the Density Ratio
Applying A Statistical Dependence Estimator to An Autoencoder
Experiments
Conclusion
Extended Experimental Results
Extra tables
Eigenanalysis
Visualizing isomorphism
Formal Proofs
Proof for the new NMF-like cost
Proof for the trace and logdet costs
Feature Learning: Beyond Feature Analysis
Noisy data and noisy features
Interpretation via the substitution pattern
...and 13 more sections

Figures (30)

Figure 1: Learning curves for the NMF-like cost. The curves are smooth and stable because no re-pairing is required.
Figure 2: A learning curve of MINE on MNIST. The sudden "dip" in the curve is largely due to the re-pairing step for sampling from the product of marginals. The learning curve would be smoother if we lowered the learning rate, but convergence would take significantly longer.
Figure 3: Learning curves of singular values.
Figure 4: Two-moon dataset: left and right singular functions for the pairs $\{X,Y'\}$ and $\{X,\widehat{\,X\,}\}$. (a), (c), and (d) display 2D singular functions as heatmaps (nine functions shown per panel). (b) displays 1D singular functions as curves (six functions shown).
Figure 5: MNIST: left and right singular functions for the pairs $\{X,Y'\}$ and $\{X,\widehat{\,X\,}\}$. In (a) we have excluded the trivial constant singular function that always has a singular value $1$.
...and 25 more figures

A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis

Abstract

A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis

Authors

Abstract

Table of Contents

Figures (30)