Table of Contents
Fetching ...

Learning Natural Language Inference with LSTM

Shuohang Wang, Jing Jiang

TL;DR

This work introduces a match-LSTM (mLSTM) for natural language inference that processes the hypothesis word-by-word while matching it to an attention-weighted representation of the premise. By remembering important mismatches and down-weighting less informative matches, the model achieves state-of-the-art SNLI performance (86.1% test accuracy). The approach shifts from single embedding-based matching to sequential, memory-augmented word-level alignment, with analyses showing why certain mismatches drive predictions. Despite impressive results, the method’s data hunger motivates exploring additional resources like paraphrase databases to bolster performance on smaller datasets.

Abstract

Natural language inference (NLI) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and evaluate learning-centered methods such as deep neural networks for natural language inference (NLI). In this paper, we propose a special long short-term memory (LSTM) architecture for NLI. Our model builds on top of a recently proposed neural attention model for NLI but is based on a significantly different idea. Instead of deriving sentence embeddings for the premise and the hypothesis to be used for classification, our solution uses a match-LSTM to perform word-by-word matching of the hypothesis with the premise. This LSTM is able to place more emphasis on important word-level matching results. In particular, we observe that this LSTM remembers important mismatches that are critical for predicting the contradiction or the neutral relationship label. On the SNLI corpus, our model achieves an accuracy of 86.1%, outperforming the state of the art.

Learning Natural Language Inference with LSTM

TL;DR

This work introduces a match-LSTM (mLSTM) for natural language inference that processes the hypothesis word-by-word while matching it to an attention-weighted representation of the premise. By remembering important mismatches and down-weighting less informative matches, the model achieves state-of-the-art SNLI performance (86.1% test accuracy). The approach shifts from single embedding-based matching to sequential, memory-augmented word-level alignment, with analyses showing why certain mismatches drive predictions. Despite impressive results, the method’s data hunger motivates exploring additional resources like paraphrase databases to bolster performance on smaller datasets.

Abstract

Natural language inference (NLI) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and evaluate learning-centered methods such as deep neural networks for natural language inference (NLI). In this paper, we propose a special long short-term memory (LSTM) architecture for NLI. Our model builds on top of a recently proposed neural attention model for NLI but is based on a significantly different idea. Instead of deriving sentence embeddings for the premise and the hypothesis to be used for classification, our solution uses a match-LSTM to perform word-by-word matching of the hypothesis with the premise. This LSTM is able to place more emphasis on important word-level matching results. In particular, we observe that this LSTM remembers important mismatches that are critical for predicting the contradiction or the neutral relationship label. On the SNLI corpus, our model achieves an accuracy of 86.1%, outperforming the state of the art.

Paper Structure

This paper contains 11 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The top figure depicts the model by Rocktäschel et al. (2016) and the bottom figure depicts our model. Here $\mathbf{H}^\text{s}$ represents all the hidden states $\mathbf{h}^\text{s}_j$. Note that in the top model each $\mathbf{h}^\text{a}_k$ represents a weighted version of the premise only, while in our model, each $\mathbf{h}^\text{m}_k$ represents the matching between the premise and the hypothesis up to position $k$.
  • Figure 2: The alignment weights and the gate vectors of the three examples.