Table of Contents
Fetching ...

ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentence

Yuxin Wang, Xiaomeng Zhu, Weimin Lyu, Saeed Hassanpour, Soroush Vosoughi

TL;DR

ImpScore introduces a reference-free metric $I$ that quantifies sentence implicitness as the divergence between semantic meaning and pragmatic interpretation. Built on a Sentence-BERT backbone, it projects embeddings into separate semantic and pragmatic spaces via projection matrices and a transformation, then trains with pairwise contrastive losses on $112{,}580$ (implicit, explicit) pairs to produce robust scores. Across in-distribution and out-of-distribution evaluations, ImpScore achieves high implicitness and pragmatics accuracy and correlates strongly with human judgments, while revealing LLM limitations on highly implicit content in hate-speech contexts. The approach is lightweight, scalable, and openly accessible, enabling large-scale implicitness analysis for evaluation, data curation, and potential reinforcement-learning signals in language model training.

Abstract

Handling implicit language is essential for natural language processing systems to achieve precise text understanding and facilitate natural interactions with users. Despite its importance, the absence of a metric for accurately measuring the implicitness of language significantly constrains the depth of analysis possible in evaluating models' comprehension capabilities. This paper addresses this gap by developing a scalar metric that quantifies the implicitness level of language without relying on external references. Drawing on principles from traditional linguistics, we define "implicitness" as the divergence between semantic meaning and pragmatic interpretation. To operationalize this definition, we introduce ImpScore, a reference-free metric formulated through an interpretable regression model. This model is trained using pairwise contrastive learning on a specially curated dataset consisting of (implicit sentence, explicit sentence) pairs. We validate ImpScore through a user study that compares its assessments with human evaluations on out-of-distribution data, demonstrating its accuracy and strong correlation with human judgments. Additionally, we apply ImpScore to hate speech detection datasets, illustrating its utility and highlighting significant limitations in current large language models' ability to understand highly implicit content. Our metric is publicly available at https://github.com/audreycs/ImpScore.

ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentence

TL;DR

ImpScore introduces a reference-free metric that quantifies sentence implicitness as the divergence between semantic meaning and pragmatic interpretation. Built on a Sentence-BERT backbone, it projects embeddings into separate semantic and pragmatic spaces via projection matrices and a transformation, then trains with pairwise contrastive losses on (implicit, explicit) pairs to produce robust scores. Across in-distribution and out-of-distribution evaluations, ImpScore achieves high implicitness and pragmatics accuracy and correlates strongly with human judgments, while revealing LLM limitations on highly implicit content in hate-speech contexts. The approach is lightweight, scalable, and openly accessible, enabling large-scale implicitness analysis for evaluation, data curation, and potential reinforcement-learning signals in language model training.

Abstract

Handling implicit language is essential for natural language processing systems to achieve precise text understanding and facilitate natural interactions with users. Despite its importance, the absence of a metric for accurately measuring the implicitness of language significantly constrains the depth of analysis possible in evaluating models' comprehension capabilities. This paper addresses this gap by developing a scalar metric that quantifies the implicitness level of language without relying on external references. Drawing on principles from traditional linguistics, we define "implicitness" as the divergence between semantic meaning and pragmatic interpretation. To operationalize this definition, we introduce ImpScore, a reference-free metric formulated through an interpretable regression model. This model is trained using pairwise contrastive learning on a specially curated dataset consisting of (implicit sentence, explicit sentence) pairs. We validate ImpScore through a user study that compares its assessments with human evaluations on out-of-distribution data, demonstrating its accuracy and strong correlation with human judgments. Additionally, we apply ImpScore to hate speech detection datasets, illustrating its utility and highlighting significant limitations in current large language models' ability to understand highly implicit content. Our metric is publicly available at https://github.com/audreycs/ImpScore.

Paper Structure

This paper contains 40 sections, 11 equations, 25 figures, 12 tables.

Figures (25)

  • Figure 1: An illustration of how humans typically perceive sentences with different implicitness levels. Sentences directly expressing their pragmatics are generally more explicit (indicated by blue arrows), and it is easier to distinguish the implicitness levels between sentences with close pragmatics (expressing similar meaning).
  • Figure 2: An overview of the training process of ImpScore. The sentence in blue is an implicit sentence, while the sentence in green and the sentence in red are explicit sentences in the positive and negative pairs, respectively. Colored $\triangle$ and $\bigcirc$ markers denote the feature embedding points of sentences in corresponding colors. $I_1$, $I_2$, and $I_3$ denote the implicitness scores of these sentences. $\Delta P^+$ and $\Delta P^-$ denote the pragmatic distances of the positive and negative pairs, respectively. $\gamma_1$ and $\gamma_2$ are model hyperparameters.
  • Figure 2: Ablation study results of different model variations of ImpScore. The number in the left of / indicates the Implicitness Accuracy, and the right number indicates the Pragmatics Accuracy. Best Implicitness Accuracy are highlighted in bold. ImpScore is robust to different Space Transformation methods.
  • Figure 3: Left panel: Detailed results of ImpScore on the test set. $I^{imp}$ and $I^{exp}$ denote the implicitness score of implicit sentence and explicit sentence (higher indicates more implicit), and $\Delta P^+$ and $\Delta P^-$ denote the pragmatic distance of positive pair and negative pair (higher indicates pragmatically farther). Center panel: The distribution of implicitness scores of implicit sentence, explicit sentence in positive pair, and explicit sentence in negative pair. Right panel: The distribution of pragmatic distances of positive pairs and negative pairs.
  • Figure 4: Hyperparameter sensitivity of $\gamma_1$ and $\gamma_2$ on Implicitness Accuracy when $\alpha=1.0$. Highest point is marked in red dotted box.
  • ...and 20 more figures