Syntactic Learnability of Echo State Neural Language Models at Scale
Ryo Ueda, Tatsuki Kuribayashi, Shunsuke Kando, Kentaro Inui
TL;DR
This work asks whether language-learning capabilities can emerge from minimal network architectures. It revisits Echo State Networks, a reservoir computing model with fixed recurrent and input weights and a trainable low-rank output, as language models trained on ~100M words. Across a spectrum of reservoir sizes, ESNs can match or exceed Transformer performance on syntactic tasks (BLiMP) and approach Transformer performance in grammaticality judgments, while still maintaining far fewer trainable parameters than large Transformers. The findings suggest that, for certain linguistic competencies, simpler, well-structured architectures may suffice, motivating further exploration of ESN topologies and multi-scale reservoir dynamics in cognitive and linguistic contexts.
Abstract
What is a neural model with minimum architectural complexity that exhibits reasonable language learning capability? To explore such a simple but sufficient neural language model, we revisit a basic reservoir computing (RC) model, Echo State Network (ESN), a restricted class of simple Recurrent Neural Networks. Our experiments showed that ESN with a large hidden state is comparable or superior to Transformer in grammaticality judgment tasks when trained with about 100M words, suggesting that architectures as complex as that of Transformer may not always be necessary for syntactic learning.
