Table of Contents
Fetching ...

Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

Masatoshi Tsuchiya

TL;DR

The paper investigates training-data quality for Recognizing Textual Entailment (RTE) by proposing a hypothesis-testing framework to detect hidden biases. It introduces a TE-label prediction test without premises using Naive Bayes and contrasts SNLI with SICK, revealing a hypothesis-only bias in SNLI. The results show that a large portion of neural RTE performance on SNLI can be attributed to this bias, effectively turning NN systems into TE-label predictors for biased data. These findings highlight the need to account for dataset biases when evaluating NN-based RTE models and dataset construction.

Abstract

The quality of training data is one of the crucial problems when a learning-centered approach is employed. This paper proposes a new method to investigate the quality of a large corpus designed for the recognizing textual entailment (RTE) task. The proposed method, which is inspired by a statistical hypothesis test, consists of two phases: the first phase is to introduce the predictability of textual entailment labels as a null hypothesis which is extremely unacceptable if a target corpus has no hidden bias, and the second phase is to test the null hypothesis using a Naive Bayes model. The experimental result of the Stanford Natural Language Inference (SNLI) corpus does not reject the null hypothesis. Therefore, it indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence. This paper also presents the performance impact of NN models for RTE caused by this hidden bias.

Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

TL;DR

The paper investigates training-data quality for Recognizing Textual Entailment (RTE) by proposing a hypothesis-testing framework to detect hidden biases. It introduces a TE-label prediction test without premises using Naive Bayes and contrasts SNLI with SICK, revealing a hypothesis-only bias in SNLI. The results show that a large portion of neural RTE performance on SNLI can be attributed to this bias, effectively turning NN systems into TE-label predictors for biased data. These findings highlight the need to account for dataset biases when evaluating NN-based RTE models and dataset construction.

Abstract

The quality of training data is one of the crucial problems when a learning-centered approach is employed. This paper proposes a new method to investigate the quality of a large corpus designed for the recognizing textual entailment (RTE) task. The proposed method, which is inspired by a statistical hypothesis test, consists of two phases: the first phase is to introduce the predictability of textual entailment labels as a null hypothesis which is extremely unacceptable if a target corpus has no hidden bias, and the second phase is to test the null hypothesis using a Naive Bayes model. The experimental result of the Stanford Natural Language Inference (SNLI) corpus does not reject the null hypothesis. Therefore, it indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence. This paper also presents the performance impact of NN models for RTE caused by this hidden bias.

Paper Structure

This paper contains 13 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Example sentences of RTE. The textual entailment label of $s_h$ is determinable if and only if context information is given by a premise sentence.
  • Figure 2: Confusion matrices of TE Label Prediction Models