UNSEE: Unsupervised Non-contrastive Sentence Embeddings
Ömer Veysel Çağatan
TL;DR
UNSEE tackles representation collapse in non-contrastive sentence embeddings by introducing a target network that is an EMA of the online model, enabling stable training. It systematically explores projection architectures inspired by BYOL, BSL, and other non-contrastive methods, showing that carefully tuned Online/Single projection variants can surpass SimCSE on the MTEB benchmark. Using training data derived from around one million Wikipedia sentences, UNSEE achieves state-of-the-art performance among non-contrastive approaches and shows broad improvements across MTEB tasks, validating the viability of non-contrastive objectives for high-quality sentence representations. The work highlights the critical role of architecture and optimization in non-contrastive learning and provides practical models for robust, unsupervised sentence embeddings.
Abstract
We present UNSEE: Unsupervised Non-Contrastive Sentence Embeddings, a novel approach that outperforms SimCSE in the Massive Text Embedding benchmark. Our exploration begins by addressing the challenge of representation collapse, a phenomenon observed when contrastive objectives in SimCSE are replaced with non-contrastive objectives. To counter this issue, we propose a straightforward solution known as the target network, effectively mitigating representation collapse. The introduction of the target network allows us to leverage non-contrastive objectives, maintaining training stability while achieving performance improvements comparable to contrastive objectives. Our method has achieved peak performance in non-contrastive sentence embeddings through meticulous fine-tuning and optimization. This comprehensive effort has yielded superior sentence representation models, showcasing the effectiveness of our approach.
