Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities
Jarek Duda
TL;DR
The paper tackles the gap between artificial neural networks and biology by introducing joint-distribution neurons based on Hierarchical Correlation Reconstruction (HCR). It presents HCRNN, where neural units represent joint densities $\rho(\mathbf{x})$ via $a_{\mathbf{j}}$ coefficients, enabling multidirectional propagation of both values and probability densities, and supports local training via information bottleneck and tensor decomposition. Key contributions include a practical density-parametrization of neurons, conditional-density and density-propagation rules, a framework for mutual information estimation from mixed moments, and IB-based training strategies. The approach promises more robust, probabilistic processing and potential paths toward biology-inspired AI that can better handle uncertainty and bidirectional computation.
Abstract
Recently a million of biological neurons (BNN) has turned out better from modern RL methods in playing pong~\cite{RL}, reminding they are still qualitatively superior e.g. in learning, flexibility and robustness - suggesting to try to improve current artificial e.g. MLP/KAN for better agreement with biological. There is proposed extension of KAN approach to neurons containing model of local joint distribution: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables, getting conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and very promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax $\textrm{Pr}\propto \exp(-E)$ e.g. in embeddings of transformer, into their probability distributions working on $(a_j)$ few moments: $ρ(x)\approx \sum_j a_j f_j(x)$.
