Table of Contents
Fetching ...

Linguistically Conditioned Semantic Textual Similarity

Jingxuan Tu, Keer Xu, Liulu Yue, Bingyang Ye, Kyeongmin Rim, James Pustejovsky

TL;DR

This work targets semantic textual similarity under given conditions (C-STS), identifying substantial annotation errors and ill-defined conditions in existing datasets. The authors reannotate the validation set, show 55% disagreement, and develop a QA-based pipeline that generates condition-focused answers to support error detection and model training. They demonstrate that QA-generated answers correlate more with the reannotations (Spearman up to 55.44) than the original labels and achieve significant performance gains over baselines when training on these answers. Finally, they propose a typed-feature-structure based conditioning scheme to ground conditions linguistically, offering a scalable approach to constructing more precise C-STS data.

Abstract

Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.

Linguistically Conditioned Semantic Textual Similarity

TL;DR

This work targets semantic textual similarity under given conditions (C-STS), identifying substantial annotation errors and ill-defined conditions in existing datasets. The authors reannotate the validation set, show 55% disagreement, and develop a QA-based pipeline that generates condition-focused answers to support error detection and model training. They demonstrate that QA-generated answers correlate more with the reannotations (Spearman up to 55.44) than the original labels and achieve significant performance gains over baselines when training on these answers. Finally, they propose a typed-feature-structure based conditioning scheme to ground conditions linguistically, offering a scalable approach to constructing more precise C-STS data.

Abstract

Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.
Paper Structure (44 sections, 10 figures, 8 tables)

This paper contains 44 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: A problematic example from the C-STS dataset. The binarity of the condition cannot be mapped to a 5-point similarity scale. The label can be subjective depending on how much inference is made from the context. No guideline on the scenario when the information regarding the condition is missing.
  • Figure 2: Distribution of top 10 frequent features and entities from the conditions in the dataset. For the singleton with no explicit mention of the feature, we default the condition features from this group to type.
  • Figure 3: The similarity score distribution of the original and relabeled validation set.
  • Figure 4: Answer generation and error identification pipeline on the validation set.
  • Figure 5: Model (SimCSE with bi-encoder and GPT) evaluation results on original and relabeled validation set.
  • ...and 5 more figures