Table of Contents
Fetching ...

Local Interpretations for Explainable Natural Language Processing: A Survey

Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon

TL;DR

This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis.

Abstract

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

Local Interpretations for Explainable Natural Language Processing: A Survey

TL;DR

This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis.

Abstract

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

Paper Structure

This paper contains 41 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Sample visualizations of identified important features from the inputs detected by four different methods. (a): Rationale Extraction on sentiment analysis task; (b) Attention Weights on Visual Question Answering task: (c) Word importance from Attribution methods on machine translation task; (d) Input perturbation on sentiment analysis task and the expansion of counterfactual explanation.
  • Figure 2: Typology of local interpretable methods by identifying the important features from inputs.
  • Figure 3: Typology of Probing.