Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

Jarek Duda

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

Jarek Duda

TL;DR

The paper tackles the gap between artificial neural networks and biology by introducing joint-distribution neurons based on Hierarchical Correlation Reconstruction (HCR). It presents HCRNN, where neural units represent joint densities $\rho(\mathbf{x})$ via $a_{\mathbf{j}}$ coefficients, enabling multidirectional propagation of both values and probability densities, and supports local training via information bottleneck and tensor decomposition. Key contributions include a practical density-parametrization of neurons, conditional-density and density-propagation rules, a framework for mutual information estimation from mixed moments, and IB-based training strategies. The approach promises more robust, probabilistic processing and potential paths toward biology-inspired AI that can better handle uncertainty and bidirectional computation.

Abstract

Recently a million of biological neurons (BNN) has turned out better from modern RL methods in playing pong~\cite{RL}, reminding they are still qualitatively superior e.g. in learning, flexibility and robustness - suggesting to try to improve current artificial e.g. MLP/KAN for better agreement with biological. There is proposed extension of KAN approach to neurons containing model of local joint distribution: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables, getting conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and very promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax $\textrm{Pr}\propto \exp(-E)$ e.g. in embeddings of transformer, into their probability distributions working on $(a_j)$ few moments: $ρ(x)\approx \sum_j a_j f_j(x)$.

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

TL;DR

via

coefficients, enabling multidirectional propagation of both values and probability densities, and supports local training via information bottleneck and tensor decomposition. Key contributions include a practical density-parametrization of neurons, conditional-density and density-propagation rules, a framework for mutual information estimation from mixed moments, and IB-based training strategies. The approach promises more robust, probabilistic processing and potential paths toward biology-inspired AI that can better handle uncertainty and bidirectional computation.

Abstract

for

, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables, getting conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and very promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax

e.g. in embeddings of transformer, into their probability distributions working on

few moments:

Paper Structure (15 sections, 43 equations, 13 figures)

This paper contains 15 sections, 43 equations, 13 figures.

Introduction
HCR neural networks (HCRNN)
Introduction to Hierarchical Correlation Reconstrution
Conditional distributions and expected value propagation
Propagation of probability distributions
Tensor decomposition and linearization
Basis optimization and selection
Some HCRNN training approaches
Information bottleneck based training
Information theory view on HCR
Kernel formulation like HSIC
Gradient descent optimization of information bottleneck
Modification of weights based on gradients
Modifying content of intermediate layers
Conclusions and further work

Figures (13)

Figure 1: Like in MLP/KAN approach, we search for logical neurons supposed to extract essential mathematics hidden in biological neurons - focusing on their usually missing 3 properties: bidirectional propagation axon, also working of distributions e.g. for observed risk avoidance risk, and using local training like information bottleneck information. All 3 are natural for neurons containing joint distribution model, practical with HCR approximating joint density as a linear combination: $\rho(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $a_\mathbf{j}$ moments as neuron parameters.
Figure 2: Basic formulas and example for $d=2$ variables HCR neuron, using convenient variable normalization to nearly uniform in $[0,1]$. Neuron contains matrix of moments:$a_{ij}$ (generally order $d$ tensor), allowing to propagate in various directions by substituting some variables and normalizing to get estimated conditional density for the remaining - just permuting indexes to change propagation direction. For value propagation we can take expected value, what restricts prediction to 1-st moment - becoming summation of trained polynomials like in KANkan, allowing to view HCRNN as extension of KAN e.g. for optional density propagation and change of propagation direction. Additionally, such $a_{ij}$ parameters are actual moments allowing for better interpretation, and inexpensive estimation of mutual information - for training or monitoring of information flow.
Figure 3: Summary of differences between artificial (ANN) and biological neural networks (BNN, based on https://www.geeksforgeeks.org/difference-between-ann-and-bnn/) - BNNs are qualitatively superior in terms of learning, flexibility and robustness - just increasing the number of neurons might be insufficient to reach them. To build ANNs closer to BNN capabilities, we should include their neuron-level properties, summarized in Fig. \ref{['learn']}, all possible for neurons containing local joint distribution model - allowing to approach complete statistical description of information available to neuron, for organisms build probabilistic models of the world with their networks.
Figure 4: Simple 2/3D examples from https://community.wolfram.com/groups/-/m/t/3241700 of propagation in any direction based on the shown datasets (points) as conditional expected values, here being degree $m=8$ polynomials.
Figure 5: The proposed HCR neuron and neural network (HCRN, HCRNN) containing local joint distribution model represented in $(a_\mathbf{j})_{\mathbf{j}\in B}$ tensor, e.g. $(a_{ijk})_{i,j,k\in \{0,..,m\}}$ for $d=3$ connections. Top: used orthonormal polynomial basis for uniform weight in $[0,1]$, convenient for such normalization to quantiles. Middle: HCR neuron containing and applying joint distribution model here for $d=3$ variables, and gathered formulas for direct estimation/model update, its application to propagate entire distributions and expected values alone. Such $\rho$ density parametrization can drop below 0, what is usually repaired by calibration e.g. using normalized $\max(\rho,0.1)$ density. However, for neural networks with inter-layer normalization this issue seems negligible, what essentially simplifies calculations to the shown formulas. Propagating only expected values and normalizing, we can use only the marked nominators - as in KAN optimizing nonlinear functions (polynomial here) by including only pairwise dependencies ($a$ with two nonzero indexes), extending to their products to consciously include higher order dependencies. Bottom: schematic HCR neural network and some training approaches of intermediate layers - which in HCR can be treated as values or their distributions (replacing $f_i(u)$ with its $i$-th moment: $\int_0^1 \rho(u) f_i(u) du$). There is also visualized tensor decomposition approach - estimate dependencies (e.g. pairwise) for multiple variables and try to automatically decompose it to multiple dependencies of a smaller numbers of variables with algebraic methods.
...and 8 more figures

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

TL;DR

Abstract

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

Authors

TL;DR

Abstract

Table of Contents

Figures (13)