Table of Contents
Fetching ...

Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models

Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhigang Chen, Shijin Wang

TL;DR

It is discovered that passage-to-question and passage understanding attentions are the most important ones in the question answering process, showing strong correlations to the final performance than other parts of the MRC system.

Abstract

Achieving human-level performance on some of the Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, the internal mechanism of these artifacts remains unclear, placing an obstacle for further understanding these models. This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance, revealing the potential explainability in PLM-based MRC models. To ensure the robustness of the analyses, we perform our experiments in a multilingual way on top of various PLMs. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process, showing strong correlations to the final performance than other parts. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.

Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models

TL;DR

It is discovered that passage-to-question and passage understanding attentions are the most important ones in the question answering process, showing strong correlations to the final performance than other parts of the MRC system.

Abstract

Achieving human-level performance on some of the Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, the internal mechanism of these artifacts remains unclear, placing an obstacle for further understanding these models. This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance, revealing the potential explainability in PLM-based MRC models. To ensure the robustness of the analyses, we perform our experiments in a multilingual way on top of various PLMs. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process, showing strong correlations to the final performance than other parts. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.

Paper Structure

This paper contains 27 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Intuitive explanation of four different attention zones for MRC tasks. Q: Question, P: Passage.
  • Figure 2: Attention maps of 2nd and 4th head in the last layer of fine-tuned BERT$_\text{base}$ on SQuAD. There are strong patterns in diagonal elements and the elements that related to special tokens.
  • Figure 3: Layer-wise analyses in different attention zones for SQuAD and CMRC 2018. The lighter color means the performance is near the baseline, while darker color means a bigger gap to the baseline (red: above baseline, blue: below baseline).
  • Figure 4: Head-wise analyses in different attention zones for SQuAD and CMRC 2018.
  • Figure 5: Analyses of different question types for SQuAD. The number of each question type (in order): what (6073), how (1389), who (1377), when (864), which (747), where (508), why (158).
  • ...and 4 more figures