Exploring higher-order neural network node interactions with total correlation
Thomas Kerby, Teresa White, Kevin Moon
TL;DR
HOIs are fundamental but scale as $O(2^n)$ with the number of variables, making global analysis intractable. Local CorEx uses PHATE to embed data, partitions it into clusters, and learns local latent factors by optimizing total correlation within each cluster, with latent factors satisfying $z = Wx + \epsilon$. It demonstrates effectiveness on synthetic data, the Communities dataset, and MNIST, and extends to neural-network interpretability by identifying local groups of hidden nodes that influence specific logits; dropout analysis reveals how redundancy grows in deeper layers. The approach provides a scalable, interpretable, unsupervised framework for discovering HOIs across heterogeneous data and neural representations.
Abstract
In domains such as ecological systems, collaborations, and the human brain the variables interact in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult problem that is further exacerbated when the HOIs change across the data. To solve this problem we propose a new method called Local Correlation Explanation (CorEx) to capture HOIs at a local scale by first clustering data points based on their proximity on the data manifold. We then use a multivariate version of the mutual information called the total correlation, to construct a latent factor representation of the data within each cluster to learn the local HOIs. We use Local CorEx to explore HOIs in synthetic and real world data to extract hidden insights about the data structure. Lastly, we demonstrate Local CorEx's suitability to explore and interpret the inner workings of trained neural networks.
