Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

Bowen Zhang; Chunping Li

Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

Bowen Zhang, Chunping Li

TL;DR

This work addresses the gap in semantic textual similarity modeling by reframing multi-category STS as a regression task, enabling the capture of progressive category relationships with a single output node. It introduces two zero-gradient loss functions, Translated ReLU and Smooth K2 Loss, to accommodate discrete ground-truth points while permitting robust approximate predictions. Empirical results across seven STS benchmarks show the regression framework, especially with Smooth K2 Loss, outperforms traditional classification-based training on traditional PLMs and can further enhance contrastive-pretrained models through targeted fine-tuning, all while reducing output-layer parameter counts and improving computational efficiency. The methods offer a practical, data-efficient alternative to heavy contrastive-learning setups, with code and training details made available for reproducibility.

Abstract

Since the introduction of BERT and RoBERTa, research on Semantic Textual Similarity (STS) has made groundbreaking progress. Particularly, the adoption of contrastive learning has substantially elevated state-of-the-art performance across various STS benchmarks. However, contrastive learning categorizes text pairs as either semantically similar or dissimilar, failing to leverage fine-grained annotated information and necessitating large batch sizes to prevent model collapse. These constraints pose challenges for researchers engaged in STS tasks that involve nuanced similarity levels or those with limited computational resources, compelling them to explore alternatives like Sentence-BERT. Despite its efficiency, Sentence-BERT tackles STS tasks from a classification perspective, overlooking the progressive nature of semantic relationships, which results in suboptimal performance. To bridge this gap, this paper presents an innovative regression framework and proposes two simple yet effective loss functions: Translated ReLU and Smooth K2 Loss. Experimental results demonstrate that our method achieves convincing performance across seven established STS benchmarks and offers the potential for further optimization of contrastive learning pre-trained models.

Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

TL;DR

Abstract

Paper Structure (14 sections, 3 equations, 3 figures, 7 tables)

This paper contains 14 sections, 3 equations, 3 figures, 7 tables.

Introduction
Related Work
Methodology
Network Architecture
Translated ReLU
Smooth K2 Loss
Experiment
STS Performance Based on Traditional Discriminative Pre-Trained Models
STS Performance Based on Contrastive Learning Pre-Trained Models
Computational Resource Overhead
Impact of Different Hyperparameter Settings
Ablation Studies
Conclusion
Data Filtering Method

Figures (3)

Figure 1: Our Regression Framework. Here, the two BERT models share same parameters, with "dim" representing the embedding dimensions of $u$ and $v$.
Figure 2: Comparison of Translated ReLU and Smooth K2 Loss, both with $k = 2, x_0 = 0.25$.
Figure 3: Our two-stage fine-tuning process for contrastive learning pre-trained models. In the figure, modules highlighted in red are active during training and undergo backpropagation, while modules in blue are frozen and do not carry out updates.

Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

TL;DR

Abstract

Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

Authors

TL;DR

Abstract

Table of Contents

Figures (3)