Table of Contents
Fetching ...

Textual Entailment Recognition with Semantic Features from Empirical Text Representation

Md Shajalal, Md Atabuzzaman, Maksuda Bilkis Baby, Md Rezaul Karim, Alexander Boden

TL;DR

This work addresses textual entailment recognition by introducing a threshold-based empirical semantic representation that filters word embedding components to form high-dimensional semantic vectors $v_T$ and $v_H$, enabling an element-wise distance $EMDV$ for text-hypothesis pairs. It combines $EMDV$ with its scalar ${Avg(EMDV)}$ and handcrafted lexical/semantic features (JAC, BoW, STS) to train multiple ML classifiers and an ensemble via majority voting. On the SICK-RTE dataset, the approach achieves competitive accuracy, with notable gains when using the full feature set and thresholded representations, outperforming several baselines while remaining competitive with strong deep-learning baselines. The results demonstrate the value of integrating threshold-based semantic representations with traditional features, and the work points to future integration with deep learning to further leverage the $EMDV$ framework.

Abstract

Textual entailment recognition is one of the basic natural language understanding(NLU) tasks. Understanding the meaning of sentences is a prerequisite before applying any natural language processing(NLP) techniques to automatically recognize the textual entailment. A text entails a hypothesis if and only if the true value of the hypothesis follows the text. Classical approaches generally utilize the feature value of each word from word embedding to represent the sentences. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis, thereby introducing a new semantic feature focusing on empirical threshold-based semantic text representation. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair. We carried out several experiments on a benchmark entailment classification(SICK-RTE) dataset. We train several machine learning(ML) algorithms applying both semantic and lexical features to classify the text-hypothesis pair as entailment, neutral, or contradiction. Our empirical sentence representation technique enriches the semantic information of the texts and hypotheses found to be more efficient than the classical ones. In the end, our approach significantly outperforms known methods in understanding the meaning of the sentences for the textual entailment classification task.

Textual Entailment Recognition with Semantic Features from Empirical Text Representation

TL;DR

This work addresses textual entailment recognition by introducing a threshold-based empirical semantic representation that filters word embedding components to form high-dimensional semantic vectors and , enabling an element-wise distance for text-hypothesis pairs. It combines with its scalar and handcrafted lexical/semantic features (JAC, BoW, STS) to train multiple ML classifiers and an ensemble via majority voting. On the SICK-RTE dataset, the approach achieves competitive accuracy, with notable gains when using the full feature set and thresholded representations, outperforming several baselines while remaining competitive with strong deep-learning baselines. The results demonstrate the value of integrating threshold-based semantic representations with traditional features, and the work points to future integration with deep learning to further leverage the framework.

Abstract

Textual entailment recognition is one of the basic natural language understanding(NLU) tasks. Understanding the meaning of sentences is a prerequisite before applying any natural language processing(NLP) techniques to automatically recognize the textual entailment. A text entails a hypothesis if and only if the true value of the hypothesis follows the text. Classical approaches generally utilize the feature value of each word from word embedding to represent the sentences. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis, thereby introducing a new semantic feature focusing on empirical threshold-based semantic text representation. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair. We carried out several experiments on a benchmark entailment classification(SICK-RTE) dataset. We train several machine learning(ML) algorithms applying both semantic and lexical features to classify the text-hypothesis pair as entailment, neutral, or contradiction. Our empirical sentence representation technique enriches the semantic information of the texts and hypotheses found to be more efficient than the classical ones. In the end, our approach significantly outperforms known methods in understanding the meaning of the sentences for the textual entailment classification task.
Paper Structure (16 sections, 2 equations, 2 figures, 9 tables)

This paper contains 16 sections, 2 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Overview diagram for recognising textual entailment
  • Figure 2: Performance comparison of our proposed method in terms of Accuracy on the SICK-RTE dataset.