Table of Contents
Fetching ...

A Federated Many-to-One Hopfield model for associative Neural Networks

Andrea Alessandrelli, Fabrizio Durante, Andrea Ladiana, Andrea Lepre

Abstract

Federated learning enables collaborative training without sharing raw data, but struggles under client heterogeneity and streaming distribution shifts, where drift and novel data can impair convergence and cause forgetting. We propose a federated associative-memory framework that learns shared archetypes in heterogeneous, continual settings, where client data are independent but not necessarily balanced. Each client encodes its experience as a low-rank Hebbian operator, sent to a central server for aggregation and factorization into global archetypes. This approach preserves privacy, avoids centralized replay buffers, and is robust to small, noisy, or evolving datasets. We cast aggregation as a low-rank-plus-noise spectral inference problem, deriving theoretical thresholds for detectability and retrieval robustness. An entropy-based controller balances stability and plasticity in streaming regimes. Experiments with heterogeneous clients, drift, and novelty show improved global archetype reconstruction and associative retrieval, supporting the spectral view of federated consolidation.

A Federated Many-to-One Hopfield model for associative Neural Networks

Abstract

Federated learning enables collaborative training without sharing raw data, but struggles under client heterogeneity and streaming distribution shifts, where drift and novel data can impair convergence and cause forgetting. We propose a federated associative-memory framework that learns shared archetypes in heterogeneous, continual settings, where client data are independent but not necessarily balanced. Each client encodes its experience as a low-rank Hebbian operator, sent to a central server for aggregation and factorization into global archetypes. This approach preserves privacy, avoids centralized replay buffers, and is robust to small, noisy, or evolving datasets. We cast aggregation as a low-rank-plus-noise spectral inference problem, deriving theoretical thresholds for detectability and retrieval robustness. An entropy-based controller balances stability and plasticity in streaming regimes. Experiments with heterogeneous clients, drift, and novelty show improved global archetype reconstruction and associative retrieval, supporting the spectral view of federated consolidation.
Paper Structure (38 sections, 4 theorems, 157 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 38 sections, 4 theorems, 157 equations, 9 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Let $(\chi_j)_{j=1}^N$ be independent random variables. For a fixed communication round index $t$, let $\bm{J}_s^{(t)}$ be defined by eq:server_agg. Then, for every deviation level $u>0$:

Figures (9)

  • Figure 1: Illustration of a toy Hopfield model with six neurons. It is a fully connected graph where all the edges are determined by the Hebbian rule encoded in the $\bm{J}$ matrix. In particular the figure shows the connection between $\sigma_1$ and $\sigma_6$, which is given, as in Eq. \ref{['eq:hebb']}, by $N^{-1}\sum_{\mu=1}^K\xi^\mu_1\xi^\mu_6$.
  • Figure 2: Schematic illustration of the model for $L=3$ layers. Each layer $a\in \{1,2,3\}$ is a Hopfield network (with state $\boldsymbol{\sigma}^{(a)}$) and all layers share the same synaptic coupling matrix. The three contributions to the Hamiltonian in \ref{['eq:lam_hamiltonian']} are highlighted: imitative intra-layer interactions (green self-loop), anti-imitative inter-layer interactions (red links), and the coupling to an external field $\boldsymbol{h}^{(a)}$ (blue arrow).
  • Figure 3: Illustration of the federated pipeline. Each block represents a federation round $t$. At each round, the $L$ client layers provide as input their synaptic matrices, each estimated from the batch of examples available at that round. These matrices are combined in the aggregation layer. For $t=0$, the aggregation uses only the information received from the clients. For $t>0$, the aggregation combines both (i) the client information from round $t$ and (ii) the server-side information carried over from the previous round $t-1$. The aggregated information can either be sent back to the clients as feedback to improve the estimation of the ground-truth synaptic matrix, or forwarded to the LAM layer, where pattern reconstruction is performed. At round $t=0$ there is no feedback from the federation to the clients. At each round, we can access both the reconstructed patterns obtained after the LAM layer and the updated client synaptic matrices. Panel $b)$ shows a zoom-in of the LAM layer, where the pattern reconstruction is performed at each round. This layer takes as input the output of the aggregation layer and first applies an iterative algorithm to estimate $\mathbf{J}^{KS}$. It then generates a sufficiently large set of initial mixing states and feeds them into the LAM model, which collects possible pattern candidates. These candidates are passed to the pruning layers, which remove duplicates and apply a spectral criterion to discard forbidden candidates. The final output of the procedure is the set of $\hat{K}$ reconstructed patterns.
  • Figure 4: We show a zoom-in of the LAM layer, where the pattern reconstruction is performed at each round. This layer takes as input the output of the aggregation layer and first applies an iterative algorithm to estimate $\mathbf{J}^{KS}$. It then generates a sufficiently large set of initial mixing states and feeds them into the LAM model, which collects possible pattern candidates. These candidates are passed to the pruning layers, which remove duplicates and apply a spectral criterion to discard forbidden candidates. The final output of the procedure is the set of $\hat{K}$ reconstructed patterns.
  • Figure 6: Per-client adaptive weight dynamics in federated unsupervised learning with adversarial noise. (A) Temporal evolution of the adaptive weight $w_c(t)$ (Eq. \ref{['eq:convex_comb']}) for good clients (blue, $r=0.9$) and one attacker client receiving pure noise (vermillion dashed, $r\simeq0$). The weight update rule is based on normalized sign-agreement between the client's local Hebbian correlator $J^{(t)}_c$ and the server's reconstruction operator from the previous round (Section \ref{['subsec:plasticity_w']}). Good clients maintain stable non-zero weights ($w_{\mathrm{good}} \approx 0.4$--$0.6$), reflecting high-quality local data. The attacker's weight rapidly converges to zero ($w_{\mathrm{att}} \widetilde{\to} 0$ within $\sim$10 rounds), effectively down-weighting noisy contributions in the server aggregation. Shaded regions represent standard error over $S=20$ independent seeds. (B) Server-side retrieval quality: per-archetype magnetization $m_k(t)$ (colored lines, $k=1,2,3$) and mean retrieval $\langle m \rangle$ (black line). The system maintains high reconstruction fidelity ($\langle m \rangle \gtrsim 0.7$) despite the presence of one attacker, demonstrating robustness of the adaptive weighting scheme. Parameters: $N=1000$ neurons, $K=3$ archetypes, $L=5$ clients (4 good + 1 attacker), $T=10$ rounds, $M=800$ examples/client/round, $\alpha_{\mathrm{EMA}}=0.5$.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Theorem 1: Concentration
  • Theorem 2: Detection thresholds for the spiked covariance operator
  • Remark 1: Scope of Theoretical Guarantees
  • proof
  • Remark 2
  • Theorem 3: MP universality + isotropic global law
  • proof
  • Remark 3
  • Lemma 1: Uniform quantitative decoupling
  • proof
  • ...and 2 more