CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought
Bowen Zhang, Kehua Chang, Chunping Li
TL;DR
This work addresses unsupervised sentence representation by exploiting Chain-of-Thought reasoning within pre-trained models. It introduces CoT-BERT, a two-stage comprehension–summarization framework, plus an extended InfoNCE loss and a PAD-based template denoising strategy, all designed to maximize semantic space utilization without external corpora. Evaluated on seven STS benchmarks with RoBERTa_base achieving an average of $80.62$ Spearman, CoT-BERT delivers state-of-the-art performance in an entirely resource-efficient setup. The results, reinforced by thorough ablations, demonstrate that progressive reasoning and the tailored contrastive objective unlock latent capabilities of PLMs for high-quality unsupervised sentence embeddings.
Abstract
Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent strides within this domain have been significantly propelled by breakthroughs in contrastive learning and prompt engineering. Despite these advancements, the field has reached a plateau, leading some researchers to incorporate external components to enhance the quality of sentence embeddings. Such integration, though beneficial, complicates solutions and inflates demands for computational resources. In response to these challenges, this paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-Thought reasoning to tap into the latent potential of pre-trained models like BERT. Additionally, we develop an advanced contrastive learning loss function and propose a novel template denoising strategy. Rigorous experimentation demonstrates that CoT-BERT surpasses a range of well-established baselines by relying exclusively on the intrinsic strengths of pre-trained models.
