Table of Contents
Fetching ...

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

Bowen Zhang, Kehua Chang, Chunping Li

TL;DR

This work addresses unsupervised sentence representation by exploiting Chain-of-Thought reasoning within pre-trained models. It introduces CoT-BERT, a two-stage comprehension–summarization framework, plus an extended InfoNCE loss and a PAD-based template denoising strategy, all designed to maximize semantic space utilization without external corpora. Evaluated on seven STS benchmarks with RoBERTa_base achieving an average of $80.62$ Spearman, CoT-BERT delivers state-of-the-art performance in an entirely resource-efficient setup. The results, reinforced by thorough ablations, demonstrate that progressive reasoning and the tailored contrastive objective unlock latent capabilities of PLMs for high-quality unsupervised sentence embeddings.

Abstract

Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent strides within this domain have been significantly propelled by breakthroughs in contrastive learning and prompt engineering. Despite these advancements, the field has reached a plateau, leading some researchers to incorporate external components to enhance the quality of sentence embeddings. Such integration, though beneficial, complicates solutions and inflates demands for computational resources. In response to these challenges, this paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-Thought reasoning to tap into the latent potential of pre-trained models like BERT. Additionally, we develop an advanced contrastive learning loss function and propose a novel template denoising strategy. Rigorous experimentation demonstrates that CoT-BERT surpasses a range of well-established baselines by relying exclusively on the intrinsic strengths of pre-trained models.

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

TL;DR

This work addresses unsupervised sentence representation by exploiting Chain-of-Thought reasoning within pre-trained models. It introduces CoT-BERT, a two-stage comprehension–summarization framework, plus an extended InfoNCE loss and a PAD-based template denoising strategy, all designed to maximize semantic space utilization without external corpora. Evaluated on seven STS benchmarks with RoBERTa_base achieving an average of Spearman, CoT-BERT delivers state-of-the-art performance in an entirely resource-efficient setup. The results, reinforced by thorough ablations, demonstrate that progressive reasoning and the tailored contrastive objective unlock latent capabilities of PLMs for high-quality unsupervised sentence embeddings.

Abstract

Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent strides within this domain have been significantly propelled by breakthroughs in contrastive learning and prompt engineering. Despite these advancements, the field has reached a plateau, leading some researchers to incorporate external components to enhance the quality of sentence embeddings. Such integration, though beneficial, complicates solutions and inflates demands for computational resources. In response to these challenges, this paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-Thought reasoning to tap into the latent potential of pre-trained models like BERT. Additionally, we develop an advanced contrastive learning loss function and propose a novel template denoising strategy. Rigorous experimentation demonstrates that CoT-BERT surpasses a range of well-established baselines by relying exclusively on the intrinsic strengths of pre-trained models.
Paper Structure (19 sections, 5 equations, 3 figures, 6 tables)

This paper contains 19 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Behavior of three variants of InfoNCE Loss within the semantic space of BERT. For clarity, we depict this figure with the anchor sentence $s_i$ as the focal point.
  • Figure 2: Illustration of the template denoising method employed by CoT-BERT. In this depiction, we utilize the template for anchor sentences as an example, with analogous treatment applied to both positive and hard negative instances.
  • Figure 3: Correlation diagram between the true similarity scores and model-predicted cosine similarity on the STS-B test set. The vertical axis has been normalized for clarity, and the methods employed for deriving sentence embeddings ([CLS] or [MASK]) are explicitly indicated for reference.