Unitary Evolution Recurrent Neural Networks
Martin Arjovsky, Amar Shah, Yoshua Bengio
TL;DR
The paper tackles the challenge of vanishing and exploding gradients in RNNs by introducing Unitary Evolution RNNs (uRNNs) that maintain norm-preserving hidden-to-hidden dynamics via a structured, efficient unitary parameterization built from simple blocks. By operating in the complex domain and implementing using real-valued computations, the approach enables very large hidden states with manageable cost. The authors demonstrate that uRNNs achieve strong performance on tasks requiring long-term memory, often surpassing LSTMs and outpacing other orthogonal-init models, while offering insights into gradient propagation and saturation. These contributions suggest a scalable path for modeling long-range dependencies in sequential data using unitary, norm-preserving recurrent architectures.
Abstract
Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
