Table of Contents
Fetching ...

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

Bowen Zhang, Chunping Li

TL;DR

This work identifies a theoretical ceiling of $0.875$ for Spearman correlations achieved by contrastive learning in STS tasks, attributing it to the binary nature of such losses. It then introduces Pcc-tuning, a two-stage approach that first uses contrastive learning and then leverages a small set of fine-grained annotations with a Pearson-correlation loss to capture ordinal nuances. Empirically, Pcc-tuning consistently outperforms prior SOTA methods across multiple 7B-scale PLMs and prompts, sometimes surpassing the implied ceiling, while reducing data requirements and demonstrating memory efficiency. The approach offers a practical path to stronger semantic representations with robust transferability and minimal sensitivity to hyperparameters or templates.

Abstract

Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models. Driven by advances in pre-trained language models and contrastive learning, leading sentence representation methods have reached an average Spearman's correlation score of approximately 86 across seven STS benchmarks in SentEval. However, further progress has become increasingly marginal, with no existing method attaining an average score higher than 86.5 on these tasks. This paper conducts an in-depth analysis of this phenomenon and concludes that the upper limit for Spearman's correlation scores under contrastive learning is 87.5. To transcend this ceiling, we propose an innovative approach termed Pcc-tuning, which employs Pearson's correlation coefficient as a loss function to refine model performance beyond contrastive learning. Experimental results demonstrate that Pcc-tuning can markedly surpass previous state-of-the-art strategies with only a minimal amount of fine-grained annotated samples.

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

TL;DR

This work identifies a theoretical ceiling of for Spearman correlations achieved by contrastive learning in STS tasks, attributing it to the binary nature of such losses. It then introduces Pcc-tuning, a two-stage approach that first uses contrastive learning and then leverages a small set of fine-grained annotations with a Pearson-correlation loss to capture ordinal nuances. Empirically, Pcc-tuning consistently outperforms prior SOTA methods across multiple 7B-scale PLMs and prompts, sometimes surpassing the implied ceiling, while reducing data requirements and demonstrating memory efficiency. The approach offers a practical path to stronger semantic representations with robust transferability and minimal sensitivity to hyperparameters or templates.

Abstract

Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models. Driven by advances in pre-trained language models and contrastive learning, leading sentence representation methods have reached an average Spearman's correlation score of approximately 86 across seven STS benchmarks in SentEval. However, further progress has become increasingly marginal, with no existing method attaining an average score higher than 86.5 on these tasks. This paper conducts an in-depth analysis of this phenomenon and concludes that the upper limit for Spearman's correlation scores under contrastive learning is 87.5. To transcend this ceiling, we propose an innovative approach termed Pcc-tuning, which employs Pearson's correlation coefficient as a loss function to refine model performance beyond contrastive learning. Experimental results demonstrate that Pcc-tuning can markedly surpass previous state-of-the-art strategies with only a minimal amount of fine-grained annotated samples.
Paper Structure (17 sections, 9 equations, 2 figures, 8 tables)

This paper contains 17 sections, 9 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Illustration of the operation of an optimal binary classifier in handling STS tasks. Although the actual similarity scores of the text pairs are a series of floating-point numbers, the binary classifier focuses solely on categorizing them into two classes: similar and dissimilar, without modeling the variability within each category.
  • Figure 2: The overall architecture of Pcc-tuning. By default, we use "This sentence : '[X]' can be summarized as" PretCoTandKE-ICIC-2024 as the manual template for both stages. In the diagram, $h_i$ denotes the embedding of sentence $s_i$ after model encoding, $\cos_i$ represents the cosine similarity between $h_i$ and $h_i^?$, while $\text{score}_i$ is the human-annotated similarity score for $s_i$ and $s_i^?$.