Neural auto-association with optimal Bayesian learning
Andreas Knoblauch
TL;DR
The paper investigates auto-associative neural memories through an optimal Bayesian learning framework and contrasts it with Bayesian-approximate rules like BCPNN. It shows that deviations from naive Bayes assumptions, such as constant activity patterns, iterative retrieval, and winner-take-all selection, can produce anomalies where BCPNN performs better, explained by mismatched noise assumptions over iterations. Introducing Adaptive Noise Estimation (ANE)—which updates noise estimates during iterative retrieval—significantly boosts performance, with the largest gains seen under core-retrieval strategies for Palm-pattern ensembles. The overall maximum storage capacity is attained by the Bayesian rule with ANE, while stabilized BCPNN remains robust under simpler conditions; these findings shed light on how retrieval dynamics and structure affect memory capacity in neural networks and offer a potential link to neurobiological mechanisms like short-term plasticity.
Abstract
Neural associative memories are single layer perceptrons with fast synaptic learning typically storing discrete associations between pairs of neural activity patterns. Previous works have analyzed the optimal networks under naive Bayes assumptions of independent pattern components and heteroassociation, where the task is to learn associations from input to output patterns. Here I study the optimal Bayesian associative network for auto-association where input and output layers are identical. In particular, I compare performance to different variants of approximate Bayesian learning rules, like the BCPNN (Bayesian Confidence Propagation Neural Network), and try to explain why sometimes the suboptimal learning rules achieve higher storage capacity than the (theoretically) optimal model. It turns out that performance can depend on subtle dependencies of input components violating the ``naive Bayes'' assumptions. This includes patterns with constant number of active units, iterative retrieval where patterns are repeatedly propagated through recurrent networks, and winners-take-all activation of the most probable units. Performance of all learning rules can improve significantly if they include a novel adaptive mechanism to estimate noise in iterative retrieval steps (ANE). The overall maximum storage capacity is achieved again by the Bayesian learning rule with ANE.
