Table of Contents
Fetching ...

Neural auto-association with optimal Bayesian learning

Andreas Knoblauch

TL;DR

The paper investigates auto-associative neural memories through an optimal Bayesian learning framework and contrasts it with Bayesian-approximate rules like BCPNN. It shows that deviations from naive Bayes assumptions, such as constant activity patterns, iterative retrieval, and winner-take-all selection, can produce anomalies where BCPNN performs better, explained by mismatched noise assumptions over iterations. Introducing Adaptive Noise Estimation (ANE)—which updates noise estimates during iterative retrieval—significantly boosts performance, with the largest gains seen under core-retrieval strategies for Palm-pattern ensembles. The overall maximum storage capacity is attained by the Bayesian rule with ANE, while stabilized BCPNN remains robust under simpler conditions; these findings shed light on how retrieval dynamics and structure affect memory capacity in neural networks and offer a potential link to neurobiological mechanisms like short-term plasticity.

Abstract

Neural associative memories are single layer perceptrons with fast synaptic learning typically storing discrete associations between pairs of neural activity patterns. Previous works have analyzed the optimal networks under naive Bayes assumptions of independent pattern components and heteroassociation, where the task is to learn associations from input to output patterns. Here I study the optimal Bayesian associative network for auto-association where input and output layers are identical. In particular, I compare performance to different variants of approximate Bayesian learning rules, like the BCPNN (Bayesian Confidence Propagation Neural Network), and try to explain why sometimes the suboptimal learning rules achieve higher storage capacity than the (theoretically) optimal model. It turns out that performance can depend on subtle dependencies of input components violating the ``naive Bayes'' assumptions. This includes patterns with constant number of active units, iterative retrieval where patterns are repeatedly propagated through recurrent networks, and winners-take-all activation of the most probable units. Performance of all learning rules can improve significantly if they include a novel adaptive mechanism to estimate noise in iterative retrieval steps (ANE). The overall maximum storage capacity is achieved again by the Bayesian learning rule with ANE.

Neural auto-association with optimal Bayesian learning

TL;DR

The paper investigates auto-associative neural memories through an optimal Bayesian learning framework and contrasts it with Bayesian-approximate rules like BCPNN. It shows that deviations from naive Bayes assumptions, such as constant activity patterns, iterative retrieval, and winner-take-all selection, can produce anomalies where BCPNN performs better, explained by mismatched noise assumptions over iterations. Introducing Adaptive Noise Estimation (ANE)—which updates noise estimates during iterative retrieval—significantly boosts performance, with the largest gains seen under core-retrieval strategies for Palm-pattern ensembles. The overall maximum storage capacity is attained by the Bayesian rule with ANE, while stabilized BCPNN remains robust under simpler conditions; these findings shed light on how retrieval dynamics and structure affect memory capacity in neural networks and offer a potential link to neurobiological mechanisms like short-term plasticity.

Abstract

Neural associative memories are single layer perceptrons with fast synaptic learning typically storing discrete associations between pairs of neural activity patterns. Previous works have analyzed the optimal networks under naive Bayes assumptions of independent pattern components and heteroassociation, where the task is to learn associations from input to output patterns. Here I study the optimal Bayesian associative network for auto-association where input and output layers are identical. In particular, I compare performance to different variants of approximate Bayesian learning rules, like the BCPNN (Bayesian Confidence Propagation Neural Network), and try to explain why sometimes the suboptimal learning rules achieve higher storage capacity than the (theoretically) optimal model. It turns out that performance can depend on subtle dependencies of input components violating the ``naive Bayes'' assumptions. This includes patterns with constant number of active units, iterative retrieval where patterns are repeatedly propagated through recurrent networks, and winners-take-all activation of the most probable units. Performance of all learning rules can improve significantly if they include a novel adaptive mechanism to estimate noise in iterative retrieval steps (ANE). The overall maximum storage capacity is achieved again by the Bayesian learning rule with ANE.

Paper Structure

This paper contains 11 sections, 20 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Output noise $\epsilon$ as a function of stored memories for networks of size $n=1024$, where each pattern has $k=\sqrt{n}=32$ one-entries using one-step-retrieval for queries with 10 percent input miss/add noise ($\lambda=0.9, \kappa=0.1$). Upper panels correspond to hetero-association, lower panels to auto-association. Left panels correspond to Willshaw patterns (independent components and noise). Right panels correspond to Palm patterns (fixed number of components and noise). Each panel shows results for various learning rules (see Knoblauch:NeurComp2011 for details). In particular, "B" correspond to the Bayesian rule (\ref{['eq:wij_Bayesian']},\ref{['eq:xj_Bayesian']}), BCPNN to (\ref{['eq:wij_noisy_BCPNN']}), BCPNN2 to (\ref{['eq:wij_Bayesian_BCPNNII_noisy']}), BCPNN3 to (\ref{['eq:wijBCPNNIII']}); WTA=$k$-winners-take all retrieval; th=theory. The numbers in the legend correspond to (interpolated) pattern capacity$M_{\epsilon}$ at output noise level ${\epsilon}=0.01$. See text for further details.
  • Figure 2: Results for auto-association with iterative retrieval (max. 100 iterations) for Willshaw (left) and Palm patterns (right) for $n=1024$, $k=32$, $\lambda=0.9$, $\kappa=0.1$. A,B: Output noise ${\epsilon}$ as function of stored memories $M$, similar to previous Fig. \ref{['fig1:ExpRepro2A']}. C,D: Fraction $p_{\mathrm{corr}}$ of correct retrieval outputs (corresponding to zero output noise ${\epsilon}^{\mu}=0$). Numbers in legends correspond to (interpolated) pattern capacity $M_{p_\mathrm{corr}}$ at $p_{\mathrm{corr}}=0.9$. E,F: Mean iteration number until convergence.
  • Figure 3: Pattern Capacity$M_{{\epsilon}}$at output noise level${\epsilon}=0.01$. Experimental setup is as in previous Fig. \ref{['fig2:CompLan100step']}, but for different network sizes $n=196, 361, 576, 1024$ and pattern activity $k=\sqrt{n}=14, 19, 24, 32$. Numbers of legends correspond to (interpolated) pattern capacities for each $n$.
  • Figure 4: Pattern Capacity$M_{p_{\mathrm{corr}}}$at correctness level$p_{\mathrm{corr}}=0.9$. Experimental setup is as in previous Figs. \ref{['fig2:CompLan100step']},\ref{['fig3:CompLan_M001']} for network sizes $n=196, 361, 576, 1024$ and pattern activity $k=\sqrt{n}=14, 19, 24, 32$. Numbers of legends correspond to (interpolated) pattern capacities for each $n$.