Table of Contents
Fetching ...

Explainability of Text Processing and Retrieval Methods: A Survey

Sourav Saha, Debapriyo Majumdar, Mandar Mitra

TL;DR

Approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking are surveyed.

Abstract

Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.

Explainability of Text Processing and Retrieval Methods: A Survey

TL;DR

Approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking are surveyed.

Abstract

Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
Paper Structure (49 sections, 6 equations, 3 figures, 12 tables)

This paper contains 49 sections, 6 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Reproducing from chen2020generating, showing the difference in interpretability results between HEDGE and ACD singh2019hierarchical. LSTM makes a wrong prediction as it missed the interaction between never and fails.
  • Figure 2: An example (reproduced from DBLP:conf/aaai/PandeBNKK21) showing the attention sieve received by various input tokens. The current tokens are marked with boxes and color coded edges point to other tokens in the sieve for different functional roles: a) block, b) nsubj (syntactic), c) local, d) amod (syntactic), and e) delimiter.
  • Figure 3: Reproducing figures from hao2021self, showing the examples of attribution trees generated with different datasets.