ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia, Samuel R. Bowman
TL;DR
This work introduces ListOps, a toy dataset designed to diagnose the parsing ability of latent-tree learning models by enforcing a single correct parsing strategy. It compares standard sequence models (LSTM) and tree-based models (TreeLSTM) with latent-tree approaches (RL-SPINN, ST-Gumbel), finding that latent-tree methods fail to learn reliable parses and often underperform sequential baselines, even when parsing would be advantageous. The paper demonstrates a clear RNN–TreeRNN capacity gap and positions ListOps as a diagnostic tool to drive development of parsing-aware latent-tree models. The findings emphasize the need for principled parsing mechanisms to improve latent-tree learning and downstream sentence representations.
Abstract
Latent tree learning models learn to parse a sentence without syntactic supervision, and use that parse to build the sentence representation. Existing work on such models has shown that, while they perform well on tasks like sentence classification, they do not learn grammars that conform to any plausible semantic or syntactic formalism (Williams et al., 2018a). Studying the parsing ability of such models in natural language can be challenging due to the inherent complexities of natural language, like having several valid parses for a single sentence. In this paper we introduce ListOps, a toy dataset created to study the parsing ability of latent tree models. ListOps sequences are in the style of prefix arithmetic. The dataset is designed to have a single correct parsing strategy that a system needs to learn to succeed at the task. We show that the current leading latent tree models are unable to learn to parse and succeed at ListOps. These models achieve accuracies worse than purely sequential RNNs.
