Table of Contents
Fetching ...

Improving Multi-lingual Alignment Through Soft Contrastive Learning

Minsu Park, Seyeon Choi, Chanyeol Choi, Jun-Seong Kim, Jy-yong Sohn

TL;DR

This work tackles multilingual sentence representation learning by transferring sentence similarity information from a mono-lingual teacher to a multilingual student via soft labels. It introduces soft-contrastive learning with two label strategies (Priority and Average) and an optional Training Both Cross-lingual and Mono-lingual Space (TCM) to jointly refine cross- and mono-lingual spaces, with a loss framework $L=\lambda L_{cross}+L_{mono}$. Empirical results across five languages and multiple benchmarks (Tatoeba, BUCC, FLORES-200, and STS) show that the proposed soft-label approach outperforms conventional hard-contrastive losses and MSE distillation, especially when both teacher and student use $mE5_{base}$. The findings demonstrate strong improvements in bitext mining and notable gains in STS for non-English languages, indicating practical value for cross-lingual NLP tasks and low-resource settings.

Abstract

Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model. Our method can be considered as contrastive learning with soft labels defined as the similarity between sentences. Our experimental results on five languages show that our contrastive loss with soft labels far outperforms conventional contrastive loss with hard labels in various benchmarks for bitext mining tasks and STS tasks. In addition, our method outperforms existing multi-lingual embeddings including LaBSE, for Tatoeba dataset. The code is available at https://github.com/YAI12xLinq-B/IMASCL

Improving Multi-lingual Alignment Through Soft Contrastive Learning

TL;DR

This work tackles multilingual sentence representation learning by transferring sentence similarity information from a mono-lingual teacher to a multilingual student via soft labels. It introduces soft-contrastive learning with two label strategies (Priority and Average) and an optional Training Both Cross-lingual and Mono-lingual Space (TCM) to jointly refine cross- and mono-lingual spaces, with a loss framework . Empirical results across five languages and multiple benchmarks (Tatoeba, BUCC, FLORES-200, and STS) show that the proposed soft-label approach outperforms conventional hard-contrastive losses and MSE distillation, especially when both teacher and student use . The findings demonstrate strong improvements in bitext mining and notable gains in STS for non-English languages, indicating practical value for cross-lingual NLP tasks and low-resource settings.

Abstract

Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model. Our method can be considered as contrastive learning with soft labels defined as the similarity between sentences. Our experimental results on five languages show that our contrastive loss with soft labels far outperforms conventional contrastive loss with hard labels in various benchmarks for bitext mining tasks and STS tasks. In addition, our method outperforms existing multi-lingual embeddings including LaBSE, for Tatoeba dataset. The code is available at https://github.com/YAI12xLinq-B/IMASCL
Paper Structure (23 sections, 10 equations, 1 figure, 13 tables)

This paper contains 23 sections, 10 equations, 1 figure, 13 tables.

Figures (1)

  • Figure 1: Overall framework of our method. Given $N$ sentence pairs from source/target languages, we train a multi-lingual student model $f$ by using the similarity between sentences measured by a mono-lingual teacher model $g$. Our contrastive loss function in Eq. \ref{['eq:symmetric']} uses soft-label $w(i,j)$ defined in Eq. \ref{['eq:label_priority']} and \ref{['eq:label_avg']}.