Table of Contents
Fetching ...

ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks

Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract

High-quality text representations are crucial for natural language understanding (NLU), but low-resource languages like Vietnamese face challenges due to limited annotated data. While pre-trained models like PhoBERT and CafeBERT perform well, their effectiveness is constrained by data scarcity. Contrastive learning (CL) has recently emerged as a promising approach for improving sentence representations, enabling models to effectively distinguish between semantically similar and dissimilar sentences. We propose ViCLSR (Vietnamese Contrastive Learning for Sentence Representations), a novel supervised contrastive learning framework specifically designed to optimize sentence embeddings for Vietnamese, leveraging existing natural language inference (NLI) datasets. Additionally, we propose a process to adapt existing Vietnamese datasets for supervised learning, ensuring compatibility with CL methods. Our experiments demonstrate that ViCLSR significantly outperforms the powerful monolingual pre-trained model PhoBERT on five benchmark NLU datasets such as ViNLI (+6.97% F1), ViWikiFC (+4.97% F1), ViFactCheck (+9.02% F1), UIT-ViCTSD (+5.36% F1), and ViMMRC2.0 (+4.33% Accuracy). ViCLSR shows that supervised contrastive learning can effectively address resource limitations in Vietnamese NLU tasks and improve sentence representation learning for low-resource languages. Furthermore, we conduct an in-depth analysis of the experimental results to uncover the factors contributing to the superior performance of contrastive learning models. ViCLSR is released for research purposes in advancing natural language processing tasks.

ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks

Abstract

High-quality text representations are crucial for natural language understanding (NLU), but low-resource languages like Vietnamese face challenges due to limited annotated data. While pre-trained models like PhoBERT and CafeBERT perform well, their effectiveness is constrained by data scarcity. Contrastive learning (CL) has recently emerged as a promising approach for improving sentence representations, enabling models to effectively distinguish between semantically similar and dissimilar sentences. We propose ViCLSR (Vietnamese Contrastive Learning for Sentence Representations), a novel supervised contrastive learning framework specifically designed to optimize sentence embeddings for Vietnamese, leveraging existing natural language inference (NLI) datasets. Additionally, we propose a process to adapt existing Vietnamese datasets for supervised learning, ensuring compatibility with CL methods. Our experiments demonstrate that ViCLSR significantly outperforms the powerful monolingual pre-trained model PhoBERT on five benchmark NLU datasets such as ViNLI (+6.97% F1), ViWikiFC (+4.97% F1), ViFactCheck (+9.02% F1), UIT-ViCTSD (+5.36% F1), and ViMMRC2.0 (+4.33% Accuracy). ViCLSR shows that supervised contrastive learning can effectively address resource limitations in Vietnamese NLU tasks and improve sentence representation learning for low-resource languages. Furthermore, we conduct an in-depth analysis of the experimental results to uncover the factors contributing to the superior performance of contrastive learning models. ViCLSR is released for research purposes in advancing natural language processing tasks.
Paper Structure (25 sections, 4 equations, 9 figures, 8 tables)

This paper contains 25 sections, 4 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Overview of Typical Text Representation Models from 2013 to 2024.
  • Figure 2: Overview of our supervised contrastive learning framework (ViCLSR) for Vietnamese NLU tasks, including data preparation, contrastive training, and fine-tuning for downstream tasks.
  • Figure 3: Our Contrastive Learning Architecture with a shared XLM-R encoder.
  • Figure 4: Architecture for Fine-tuning the ViCLSR Model on Various Downstream NLU Tasks. Figure 4(a) Represents Fine-tuning for Natural Language Inference and Fact Checking Tasks, Where the Model Predicts the Semantic Relationship Between a Premise (Context) and Hypothesis (Claim). Figure 4(b) Illustrates Fine-tuning for Constructive Speech Detection Tasks, Processing Single Input Sentences. Figure 4(c) Demonstrates Fine-tuning for Multiple-Choice Machine Reading Comprehension Tasks, Where the Model Predicts the Correct Answer Choice Based on the Given Context and Question.
  • Figure 5: Impact of masking rate during contrastive learning on the downstream performance of ViCLSR across five Vietnamese NLU tasks.
  • ...and 4 more figures