Table of Contents
Fetching ...

UNSEE: Unsupervised Non-contrastive Sentence Embeddings

Ömer Veysel Çağatan

TL;DR

UNSEE tackles representation collapse in non-contrastive sentence embeddings by introducing a target network that is an EMA of the online model, enabling stable training. It systematically explores projection architectures inspired by BYOL, BSL, and other non-contrastive methods, showing that carefully tuned Online/Single projection variants can surpass SimCSE on the MTEB benchmark. Using training data derived from around one million Wikipedia sentences, UNSEE achieves state-of-the-art performance among non-contrastive approaches and shows broad improvements across MTEB tasks, validating the viability of non-contrastive objectives for high-quality sentence representations. The work highlights the critical role of architecture and optimization in non-contrastive learning and provides practical models for robust, unsupervised sentence embeddings.

Abstract

We present UNSEE: Unsupervised Non-Contrastive Sentence Embeddings, a novel approach that outperforms SimCSE in the Massive Text Embedding benchmark. Our exploration begins by addressing the challenge of representation collapse, a phenomenon observed when contrastive objectives in SimCSE are replaced with non-contrastive objectives. To counter this issue, we propose a straightforward solution known as the target network, effectively mitigating representation collapse. The introduction of the target network allows us to leverage non-contrastive objectives, maintaining training stability while achieving performance improvements comparable to contrastive objectives. Our method has achieved peak performance in non-contrastive sentence embeddings through meticulous fine-tuning and optimization. This comprehensive effort has yielded superior sentence representation models, showcasing the effectiveness of our approach.

UNSEE: Unsupervised Non-contrastive Sentence Embeddings

TL;DR

UNSEE tackles representation collapse in non-contrastive sentence embeddings by introducing a target network that is an EMA of the online model, enabling stable training. It systematically explores projection architectures inspired by BYOL, BSL, and other non-contrastive methods, showing that carefully tuned Online/Single projection variants can surpass SimCSE on the MTEB benchmark. Using training data derived from around one million Wikipedia sentences, UNSEE achieves state-of-the-art performance among non-contrastive approaches and shows broad improvements across MTEB tasks, validating the viability of non-contrastive objectives for high-quality sentence representations. The work highlights the critical role of architecture and optimization in non-contrastive learning and provides practical models for robust, unsupervised sentence embeddings.

Abstract

We present UNSEE: Unsupervised Non-Contrastive Sentence Embeddings, a novel approach that outperforms SimCSE in the Massive Text Embedding benchmark. Our exploration begins by addressing the challenge of representation collapse, a phenomenon observed when contrastive objectives in SimCSE are replaced with non-contrastive objectives. To counter this issue, we propose a straightforward solution known as the target network, effectively mitigating representation collapse. The introduction of the target network allows us to leverage non-contrastive objectives, maintaining training stability while achieving performance improvements comparable to contrastive objectives. Our method has achieved peak performance in non-contrastive sentence embeddings through meticulous fine-tuning and optimization. This comprehensive effort has yielded superior sentence representation models, showcasing the effectiveness of our approach.
Paper Structure (16 sections, 5 figures, 1 table)

This paper contains 16 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Projection Model is the same as SimCSE Gao2021SimCSESC. The Online keyword is to emphasize that the model gets gradient updates. The Online Projection Model is similar to the Projection Model except for the Target Encoder. The Target Encoder is an exponentially moving average of the Online network. Both outputs from Online and Target Encoders pass through the same MLP layer in the Online Projection Model. Target MLP is not employed due to the nature of fine-tuning which will slightly change the newly initialized MLP layer that will potentially corrupt the embeddings. In Single Projection Model, Target embeddings do not go through the MLP layer unlike Online Projection Model. Single Projection Model is identical to the architecture proposed in BSL Zhang2021BootstrappedUS. We only use BERT-base devlin2018bert as the encoder.
  • Figure 2: The performance of various non-contrastive objectives on STSBenchmark evaluation dataset cer-etal-2017-semeval in the Projection Model or SimCSE setting. The difference between models is the number of MLP layers. MLP layer is adopted from BSL zhang-etal-2021-bootstrapped.
  • Figure 3: The performance of various non-contrastive objectives on STSBenchmark cer-etal-2017-semeval in the Online Projection Model with SimCSE hyperparameters. The difference between models is the number of MLP layers. MLP layer is adopted from BSL zhang-etal-2021-bootstrapped.
  • Figure 4: The performance of various non-contrastive objectives on STSBenchmark cer-etal-2017-semeval in the Single Projection Model with SimCSE hyperparameters.
  • Figure 5: The performance of various non-contrastive objectives on STSBenchmark cer-etal-2017-semeval in the Single Projection Model with slightly optimized hyperparameters. The difference between models is the number of MLP layers. MLP layer is adopted from BSL zhang-etal-2021-bootstrapped.