Table of Contents
Fetching ...

Improved Robustness and Hyperparameter Selection in the Dense Associative Memory

Hayden McAlister, Anthony Robins, Lech Szymanski

TL;DR

The modification greatly improves hyperparameter selection for the Dense Associative Memory, removing dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.

Abstract

The Dense Associative Memory generalizes the Hopfield network by allowing for sharper interaction functions. This increases the capacity of the network as an autoassociative memory as nearby learned attractors will not interfere with one another. However, the implementation of the network relies on applying large exponents to the dot product of memory vectors and probe vectors. If the dimension of the data is large the calculation can be very large and result in imprecisions and overflow when using floating point numbers in a practical implementation. We describe the computational issues in detail, modify the original network description to mitigate the problem, and show the modification will not alter the networks' dynamics during update or training. We also show our modification greatly improves hyperparameter selection for the Dense Associative Memory, removing dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.

Improved Robustness and Hyperparameter Selection in the Dense Associative Memory

TL;DR

The modification greatly improves hyperparameter selection for the Dense Associative Memory, removing dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.

Abstract

The Dense Associative Memory generalizes the Hopfield network by allowing for sharper interaction functions. This increases the capacity of the network as an autoassociative memory as nearby learned attractors will not interfere with one another. However, the implementation of the network relies on applying large exponents to the dot product of memory vectors and probe vectors. If the dimension of the data is large the calculation can be very large and result in imprecisions and overflow when using floating point numbers in a practical implementation. We describe the computational issues in detail, modify the original network description to mitigate the problem, and show the modification will not alter the networks' dynamics during update or training. We also show our modification greatly improves hyperparameter selection for the Dense Associative Memory, removing dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.
Paper Structure (18 sections, 4 theorems, 20 equations, 33 figures)

This paper contains 18 sections, 4 theorems, 20 equations, 33 figures.

Key Result

Lemma 4.1.1

The polynomial interaction function (Equation Eqn:PolynomialInteractionFunction) is homogenous.

Figures (33)

  • Figure 1: Coarse hyperparameter search space for the original network, measuring the Euclidean distance between learned states and relaxed states over various interaction vertices. Smaller distances correspond to better recall and hence better a better associative memory.
  • Figure 2: Fine hyperparameter search space for the original network, measuring the Euclidean distance between learned states and relaxed states over various interaction vertices. Smaller distances correspond to better recall and hence better a better associative memory.
  • Figure 3: Coarse hyperparameter search space for the modified network, measuring the Euclidean distance between learned states and relaxed states over various interaction vertices. Smaller distances correspond to better recall and hence better a better associative memory.
  • Figure 4: Fine hyperparameter search space for the modified network, measuring the Euclidean distance between learned states and relaxed states over various interaction vertices. Smaller distances correspond to better recall and hence better a better associative memory.
  • Figure 5: Hyperparameter search space for the original network, measuring the validation F1 score on the MNIST dataset. A larger F1 score corresponds to a better performing network.
  • ...and 28 more figures

Theorems & Definitions (8)

  • Lemma 4.1.1
  • proof
  • Lemma 4.1.2
  • proof
  • Theorem 4.2.1
  • proof
  • Theorem 4.3.1
  • proof