Table of Contents
Fetching ...

Learning Transductions and Alignments with RNN Seq2seq Models

Zhengxiang Wang

TL;DR

The paper addresses whether RNN sequence-to-sequence models can learn core string transductions and generalize beyond training data. Using controlled experiments on four tasks ($f_A$–$f_D$) with SRNN, GRU, and LSTM variants, with and without attention, it demonstrates that models tend to memorize in-distribution mappings and struggle to generalize to unseen lengths; attention improves learning efficiency and test performance but does not eradicate out-of-distribution gaps, especially for the hardest task ($f_D$). It also reveals task-complexity hierarchies that differ between attention-less and attentional models and shows counting abilities vary by architecture. Overall, the work provides a formal-language-theory perspective on neural transductions and highlights limitations and directions for future architectures and benchmarks to better probe alignment, counting, and generalization in sequence models, with code and data publicly available.

Abstract

The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length.

Learning Transductions and Alignments with RNN Seq2seq Models

TL;DR

The paper addresses whether RNN sequence-to-sequence models can learn core string transductions and generalize beyond training data. Using controlled experiments on four tasks () with SRNN, GRU, and LSTM variants, with and without attention, it demonstrates that models tend to memorize in-distribution mappings and struggle to generalize to unseen lengths; attention improves learning efficiency and test performance but does not eradicate out-of-distribution gaps, especially for the hardest task (). It also reveals task-complexity hierarchies that differ between attention-less and attentional models and shows counting abilities vary by architecture. Overall, the work provides a formal-language-theory perspective on neural transductions and highlights limitations and directions for future architectures and benchmarks to better probe alignment, counting, and generalization in sequence models, with code and data publicly available.

Abstract

The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length.
Paper Structure (20 sections, 7 equations, 7 figures, 11 tables)

This paper contains 20 sections, 7 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: An example 2-way FST to model total reduplication $f_{C}: w \rightarrow ww$, with $w$ padded into $\ltimes w \ltimes$ as the input. $\lambda$: empty string; $+1$: move right; $-1$: move left.
  • Figure 2: The conjectured mechanism for RNN seq2seq models learning identity and reversal. The multiple crossings on the top relate to identity; the multiple nested crossings at the bottom relate to reversal.
  • Figure 3: Test/gen set full-sequence accuracy per input length across the four tasks for the three types of RNN seq2seq models, with and without attention. Test set length range: 6-15; gen set length range: 1-5 & 16-30.
  • Figure 4: Test/gen set full-sequence accuracy across the three tasks (except quadratic copying) for the three attentional RNN seq2seq models with two reduced hidden sizes: 16 and 32. The results for the hidden size 64 are not included because of the near 100% and thus less informative train/test performance for all the tasks.
  • Figure 5: Test/gen set first $n$-symbol accuracy per input length across the four tasks for the three types of RNN seq2seq models, with and without attention.
  • ...and 2 more figures