Table of Contents
Fetching ...

Redundancy Maximization as a Principle of Associative Memory Learning

Mark Blümel, Andreas C. Schneider, Valentin Neuhaus, David A. Ehrlich, Marcel Graetz, Michael Wibral, Abdullah Makkeh, Viola Priesemann

TL;DR

The paper investigates how local information processing in Hopfield networks enables associative memory by using Partial Information Decomposition to dissect how recurrent and teaching inputs contribute to neuron outputs. It finds that redundancy between these inputs dominates below memory capacity, guiding the design of infomorphic neurons that maximize redundancy as a local learning goal. This approach yields memory capacities up to $\alpha_c \approx 1.58$–$1.70$, dramatically surpassing the classic Hebbian capacity of $\alpha_H \approx 0.14$ and competing with state-of-the-art rules. The work establishes redundancy maximization as a principled design criterion for associative memories and highlights pathways for future models, including extensions to correlated patterns and dense architectures with information-theoretic objectives.

Abstract

Associative memory, traditionally modeled by Hopfield networks, enables the retrieval of previously stored patterns from partial or noisy cues. Yet, the local computational principles which are required to enable this function remain incompletely understood. To formally characterize the local information processing in such systems, we employ a recent extension of information theory - Partial Information Decomposition (PID). PID decomposes the contribution of different inputs to an output into unique information from each input, redundant information across inputs, and synergistic information that emerges from combining different inputs. Applying this framework to individual neurons in classical Hopfield networks we find that below the memory capacity, the information in a neuron's activity is characterized by high redundancy between the external pattern input and the internal recurrent input, while synergy and unique information are close to zero until the memory capacity is surpassed and performance drops steeply. Inspired by this observation, we use redundancy as an information-theoretic learning goal, which is directly optimized for each neuron, dramatically increasing the network's memory capacity to 1.59, a more than tenfold improvement over the 0.14 capacity of classical Hopfield networks and even outperforming recent state-of-the-art implementations of Hopfield networks. Ultimately, this work establishes redundancy maximization as a new design principle for associative memories and opens pathways for new associative memory models based on information-theoretic goals.

Redundancy Maximization as a Principle of Associative Memory Learning

TL;DR

The paper investigates how local information processing in Hopfield networks enables associative memory by using Partial Information Decomposition to dissect how recurrent and teaching inputs contribute to neuron outputs. It finds that redundancy between these inputs dominates below memory capacity, guiding the design of infomorphic neurons that maximize redundancy as a local learning goal. This approach yields memory capacities up to , dramatically surpassing the classic Hebbian capacity of and competing with state-of-the-art rules. The work establishes redundancy maximization as a principled design criterion for associative memories and highlights pathways for future models, including extensions to correlated patterns and dense architectures with information-theoretic objectives.

Abstract

Associative memory, traditionally modeled by Hopfield networks, enables the retrieval of previously stored patterns from partial or noisy cues. Yet, the local computational principles which are required to enable this function remain incompletely understood. To formally characterize the local information processing in such systems, we employ a recent extension of information theory - Partial Information Decomposition (PID). PID decomposes the contribution of different inputs to an output into unique information from each input, redundant information across inputs, and synergistic information that emerges from combining different inputs. Applying this framework to individual neurons in classical Hopfield networks we find that below the memory capacity, the information in a neuron's activity is characterized by high redundancy between the external pattern input and the internal recurrent input, while synergy and unique information are close to zero until the memory capacity is surpassed and performance drops steeply. Inspired by this observation, we use redundancy as an information-theoretic learning goal, which is directly optimized for each neuron, dramatically increasing the network's memory capacity to 1.59, a more than tenfold improvement over the 0.14 capacity of classical Hopfield networks and even outperforming recent state-of-the-art implementations of Hopfield networks. Ultimately, this work establishes redundancy maximization as a new design principle for associative memories and opens pathways for new associative memory models based on information-theoretic goals.

Paper Structure

This paper contains 29 sections, 13 equations, 8 figures, 2 tables, 4 algorithms.

Figures (8)

  • Figure 1: For classical Hopfield networks trained with Hebbian learning, redundant information between target and recurrent input coincides with successful memory storage.A: Schematic of the analysis set-up for Hopfield networks. To measure how information is represented, each neuron is compared to a non-driving target input $T$ that provides the ground-truth pattern, in addition to its recurrent input $R$. B: Each neuron in the Hopfield network aggregates its recurrent inputs and produces an output $Y$. The neurons are initialized in the target state $\xi$. C: Partial information decomposition (PID) separates the entropy of the output $Y$ into five parts: Unique information (provided by only one of the two inputs), redundant (shared by both inputs), synergistic (emerging only from the combination of inputs) and residual entropy (not explained by the inputs). D: The PID profile as a function of memory load $\alpha$. Below the networks memory capacity ($\alpha_\mathrm{H} \approx 0.14$, indicated by dashed black line), the redundancy $\Pi_{\mathop{\mathrm{red}}\nolimits}$ is high. Above capacity, as recall fails, redundancy collapses and is replaced by unique information from the recurrent input. The accuracy of recall is shown in black. The PID profiles show the median of $20$ network initializations, with values first averaged across all neurons. The accuracy curve is the median of the $20$ initializations. Shaded areas indicate the central $90\,\%$ percentile of values. Results are from a network with $N=500$ neurons to minimize finite-size effects (see \ref{['app:finite']}).
  • Figure 2: Redundancy maximization between recurrent connections and a target is a sufficient principle for memorization in Hopfield networks, achieving a memory capacity of $\alpha_\mathrm{c}^{\mathop{\mathrm{red}}\nolimits} \approx 1.59$.A: Schematic of the infomorphic Hopfield model. During training of the infomorphic Hopfield network, the recurrent connections $\boldsymbol{w}_R$ are updated using gradient ascent on the goal function $G=\Pi_{\mathop{\mathrm{red}}\nolimits}$. B: Each neuron in the infomorphic Hopfield network aggregates its input into two compartments---the recurrent input $R$ and the target input $T$. Based on this, they stochastically produce an output $Y$. C: Recall accuracy as a function of memory load $\alpha$ for a network with $100$ neurons. Redundancy maximization achieves a memory capacity of $\alpha_\mathrm{c}^{\mathop{\mathrm{red}}\nolimits} = 1.59 \ [1.56, 1.61]$, far exceeding the Hebbian capacity of $\alpha_\mathrm{H} \approx 0.14$ (capacities marked by dashed lines). D: The PID profile shows the mean information atoms per neuron as a function of memory load $\alpha$. Redundancy dominates below capacity $\alpha <\alpha_\mathrm{c}^{\mathop{\mathrm{red}}\nolimits}$, then falls off as the capacity is crossed and the other atoms become non-zero. In C, the curve shows the median calculated across $20$ network initializations. In D, values are first averaged across all neurons, and the curve then shows the median of these averages across the initializations. For both panels, the shaded area represents the central $90 \%$ of the data, spanning the 5th to the 95th percentile.
  • Figure 3: Infomorphic Hopfield networks trained with a classical mutual information goal achieve high memory capacity by implicitly maximizing redundancy.A: The redundancy goal and two alternative learning objectives based on classical information theory: Maximizing the mutual information between output and target, $G=I(Y:T)$, and maximizing the co-information, $G= I(Y : R :T)$. B: A performance comparison of the two alternative goals as a function of memory load shows that maximizing mutual information ($G=I(Y:T)$) achieves a very similar capacity as maximizing redundancy ($G=\Pi_{\mathop{\mathrm{red}}\nolimits}$) alone. In contrast, maximizing co-information ($G= I(Y : R :T)$) fails to store any patterns. C: Information profiles for the successful $G=I(Y:T)$ goal. Beyond its memory capacity $\alpha_{c}$, mutual information $I(Y:T)$ stays high, but redundancy ($\Pi_{\mathop{\mathrm{red}}\nolimits}$) falls and is replaced by unique information from the target ($\Pi_{\mathop{\mathrm{unq}}\nolimits,T}$). D: The memory capacity $\alpha_\mathrm{c}$ is shown as a function of $\gamma_{\mathrm{unq},T}$ in the goal $G = \gamma_{\mathrm{unq},T} \Pi_{\mathrm{unq},T} + \Pi_{\mathop{\mathrm{red}}\nolimits}$. While positive values of $\gamma_{\mathop{\mathrm{unq}}\nolimits,T}$ have no strong effect on the memory capacity, negative values are detrimental. The curves in B and D show the median across 20 network initializations while in C, values are first averaged across all neurons. In all panels, the shaded area represents the central $90 \%$ of the data.
  • Figure 4: Hyperparameter optimization reveals composite information goals which outperform redundancy maximization.A: The memory capacity landscape $\alpha_{c}$ as a function of the goal parameters $\gamma_i$ reveals the performance across large parts of the parameter space. The goal parameters for redundancy $\gamma_{\mathop{\mathrm{red}}\nolimits}$ and target unique information $\gamma_{\mathop{\mathrm{unq}}\nolimits,T}$ vary on the outer axes, while the goal parameters for synergy $\gamma_{\mathop{\mathrm{syn}}\nolimits}$ and recurrent unique information $\gamma_{\mathop{\mathrm{unq}}\nolimits,R}$ vary on the inner axes. The remaining goal parameter for residual entropy $\gamma_{\mathop{\mathrm{res}}\nolimits}$ is fixed at 0. B: A more detailed subspace of A reveals the area around the heuristic goal. Here, $\gamma_{\mathop{\mathrm{unq}}\nolimits,T} = 0$ and $\gamma_{\mathop{\mathrm{red}}\nolimits} = 1$ while $\gamma_{\mathop{\mathrm{unq}}\nolimits,R}$ and $\gamma_{\mathop{\mathrm{syn}}\nolimits}$ are varied. Suppressing both $\Pi_{\mathop{\mathrm{unq}}\nolimits,R}$ and $\Pi_{\mathop{\mathrm{syn}}\nolimits}$ slightly improves the capacity above redundancy maximization. C: A direct optimization of the capacity on the full goal space reveals new goals. The heuristic redundancy goal as well as two of the best performing goals are illustrated. The optimized goals reach a memory capacity of $\alpha^\mathrm{(i)}_\mathrm{c} = 1.68 \ [1.65, 1.72]$ and $\alpha^\mathrm{(ii)}_\mathrm{c} = 1.7 \ [1.66,1.71]$. The exact goal parameters and the results of the optimizations are listed in \ref{['app:Optimizationresults']}. D: As in B, the panel shows two slices of the landscape around the optimized goals. The landscapes show a similar structure with distinct local optima.
  • Figure 5: In terms of memory capacity, the infomorphic approach outperforms other high-performance learning methods.A: Comparison of the performance of the Hebbian learning rule, our heuristic and optimized infomorphic goals, and two high-performance methods, the descent L2 method Tolmachev_2020DiedrichOpper and the minimum probability flow (MPF) hillar2012efficient. Using cosine similarity ($a_\mathrm{cos}$) as the accuracy metric, both infomorphic goals outperform all other methods. Above capacity, Hebbian learning displays spurious memories amit1985spinglass. B: Same as in A, but accuracy is measured using a strict threshold ($a_\theta$ with $\theta=0.95$). This metric reveals that the Hebbian learning rule is unable to fully reconstruct patterns above its capacity $\alpha > \alpha_\mathrm{H}$ as its accuracy falls to zero. C: The stability of memory retrieval is shown as a function of memory load $\alpha$. The figure is adapted from Tolmachev_2020, with us adding MPF and the infomorphic goals. The infomorphic method is on par with the other goals, although it was not even designed to optimize stability. The curves in A-D show the median values across 20 network initializations, and the shaded areas denote the corresponding central $90 \%$ percentile.
  • ...and 3 more figures