Lie Access Neural Turing Machine
Greg Yang
TL;DR
The paper introduces the Lie Access Neural Turing Machine (LANTM), an external-memory architecture where memory keys live in a Euclidean space and are accessed via random addresses or Lie-group actions, enabling continuous, differentiable memory manipulation. A key contribution is the InvNorm read scheme, which uses inverse-distance weighting to retrieve memory in the Euclidean key space, and its superior generalization to longer sequences with far fewer parameters than a corresponding LSTM baseline. The work shows that LANTM-InvNorm excels on permutation, arithmetic, and program-like tasks, while SoftMax-based reading underperforms on several benchmarks, highlighting the importance of distance-based key addressing in this setting. The authors also discuss generalization to other manifolds (e.g., Poincaré disk) to control key growth and outline future directions for integrating structured reasoning with neural memory in continuous domains.
Abstract
Following the recent trend in explicit neural memory structures, we present a new design of an external memory, wherein memories are stored in an Euclidean key space $\mathbb R^n$. An LSTM controller performs read and write via specialized read and write heads. It can move a head by either providing a new address in the key space (aka random access) or moving from its previous position via a Lie group action (aka Lie access). In this way, the "L" and "R" instructions of a traditional Turing Machine are generalized to arbitrary elements of a fixed Lie group action. For this reason, we name this new model the Lie Access Neural Turing Machine, or LANTM. We tested two different configurations of LANTM against an LSTM baseline in several basic experiments. We found the right configuration of LANTM to outperform the baseline in all of our experiments. In particular, we trained LANTM on addition of $k$-digit numbers for $2 \le k \le 16$, but it was able to generalize almost perfectly to $17 \le k \le 32$, all with the number of parameters 2 orders of magnitude below the LSTM baseline.
