Table of Contents
Fetching ...

Lie Access Neural Turing Machine

Greg Yang

TL;DR

The paper introduces the Lie Access Neural Turing Machine (LANTM), an external-memory architecture where memory keys live in a Euclidean space and are accessed via random addresses or Lie-group actions, enabling continuous, differentiable memory manipulation. A key contribution is the InvNorm read scheme, which uses inverse-distance weighting to retrieve memory in the Euclidean key space, and its superior generalization to longer sequences with far fewer parameters than a corresponding LSTM baseline. The work shows that LANTM-InvNorm excels on permutation, arithmetic, and program-like tasks, while SoftMax-based reading underperforms on several benchmarks, highlighting the importance of distance-based key addressing in this setting. The authors also discuss generalization to other manifolds (e.g., Poincaré disk) to control key growth and outline future directions for integrating structured reasoning with neural memory in continuous domains.

Abstract

Following the recent trend in explicit neural memory structures, we present a new design of an external memory, wherein memories are stored in an Euclidean key space $\mathbb R^n$. An LSTM controller performs read and write via specialized read and write heads. It can move a head by either providing a new address in the key space (aka random access) or moving from its previous position via a Lie group action (aka Lie access). In this way, the "L" and "R" instructions of a traditional Turing Machine are generalized to arbitrary elements of a fixed Lie group action. For this reason, we name this new model the Lie Access Neural Turing Machine, or LANTM. We tested two different configurations of LANTM against an LSTM baseline in several basic experiments. We found the right configuration of LANTM to outperform the baseline in all of our experiments. In particular, we trained LANTM on addition of $k$-digit numbers for $2 \le k \le 16$, but it was able to generalize almost perfectly to $17 \le k \le 32$, all with the number of parameters 2 orders of magnitude below the LSTM baseline.

Lie Access Neural Turing Machine

TL;DR

The paper introduces the Lie Access Neural Turing Machine (LANTM), an external-memory architecture where memory keys live in a Euclidean space and are accessed via random addresses or Lie-group actions, enabling continuous, differentiable memory manipulation. A key contribution is the InvNorm read scheme, which uses inverse-distance weighting to retrieve memory in the Euclidean key space, and its superior generalization to longer sequences with far fewer parameters than a corresponding LSTM baseline. The work shows that LANTM-InvNorm excels on permutation, arithmetic, and program-like tasks, while SoftMax-based reading underperforms on several benchmarks, highlighting the importance of distance-based key addressing in this setting. The authors also discuss generalization to other manifolds (e.g., Poincaré disk) to control key growth and outline future directions for integrating structured reasoning with neural memory in continuous domains.

Abstract

Following the recent trend in explicit neural memory structures, we present a new design of an external memory, wherein memories are stored in an Euclidean key space . An LSTM controller performs read and write via specialized read and write heads. It can move a head by either providing a new address in the key space (aka random access) or moving from its previous position via a Lie group action (aka Lie access). In this way, the "L" and "R" instructions of a traditional Turing Machine are generalized to arbitrary elements of a fixed Lie group action. For this reason, we name this new model the Lie Access Neural Turing Machine, or LANTM. We tested two different configurations of LANTM against an LSTM baseline in several basic experiments. We found the right configuration of LANTM to outperform the baseline in all of our experiments. In particular, we trained LANTM on addition of -digit numbers for , but it was able to generalize almost perfectly to , all with the number of parameters 2 orders of magnitude below the LSTM baseline.

Paper Structure

This paper contains 32 sections, 12 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: LSTM schematics, with and without external memory. A plain LSTM is illustrated by the undashed part of the diagram. LSTM as a controller of an external memory is illustrated by including the dashed parts. The $\succ$ gate indicates concatenating inputs and applying a linear transformation given by the weights of the network. The $\prec$ gate indicates the splitting of a vector. $F$ is any processing of $h^{(t)}$ to produce the final output $y^{(t)}$, e.g. a softmax to produce a distribution over vocabulary.
  • Figure 2: Retrieval of value from memory via a key. Weightings with unit sum are assigned to different memories depending on the distances from the addresses to the read key. The weighted arithmetic mean is emitted as the final read value. Both InvNorm and SoftMax schemes follow this method, but each with a different way of computing the weightings. In particular, the SoftMax scheme requires another input, the temperature $T^{(t)}$.
  • Figure 3: addressing mechanism.
  • Figure 4: Summary of controller interaction with external memories. The dashed boxes correspond to dashed parts in figure \ref{['LSTM']}. Note that all input, output and the states of the LSTM other than $\rho^{(t)}$ have been omitted.
  • Figure 5: example in/out schematic.
  • ...and 5 more figures