Table of Contents
Fetching ...

Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations

Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Hadi Alizadeh, Zeinab Sadat Taghavi, Hossein Sameti

TL;DR

This paper investigates sentence-level semantic textual relatedness (STR) by fine-tuning RoBERTa as a regression model for SemEval-2024 Task 1 Track A across English, Spanish, and Arabic. It leverages a Transformer-based architecture with a regression head, analyzes multilingual performance, and experiments with T5-based data augmentation. The results show robust English (0.82) and Spanish (0.67) correlations but considerably weaker Arabic performance (0.38), highlighting language-resource and linguistic differences. The study suggests future work in multilingual Transformer benchmarks, language-family-specific models, and translation/augmentation techniques to boost STR across low-resource languages.

Abstract

Semantic Textual Relatedness holds significant relevance in Natural Language Processing, finding applications across various domains. Traditionally, approaches to STR have relied on knowledge-based and statistical methods. However, with the emergence of Large Language Models, there has been a paradigm shift, ushering in new methodologies. In this paper, we delve into the investigation of sentence-level STR within Track A (Supervised) by leveraging fine-tuning techniques on the RoBERTa transformer. Our study focuses on assessing the efficacy of this approach across different languages. Notably, our findings indicate promising advancements in STR performance, particularly in Latin languages. Specifically, our results demonstrate notable improvements in English, achieving a correlation of 0.82 and securing a commendable 19th rank. Similarly, in Spanish, we achieved a correlation of 0.67, securing the 15th position. However, our approach encounters challenges in languages like Arabic, where we observed a correlation of only 0.38, resulting in a 20th rank.

Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations

TL;DR

This paper investigates sentence-level semantic textual relatedness (STR) by fine-tuning RoBERTa as a regression model for SemEval-2024 Task 1 Track A across English, Spanish, and Arabic. It leverages a Transformer-based architecture with a regression head, analyzes multilingual performance, and experiments with T5-based data augmentation. The results show robust English (0.82) and Spanish (0.67) correlations but considerably weaker Arabic performance (0.38), highlighting language-resource and linguistic differences. The study suggests future work in multilingual Transformer benchmarks, language-family-specific models, and translation/augmentation techniques to boost STR across low-resource languages.

Abstract

Semantic Textual Relatedness holds significant relevance in Natural Language Processing, finding applications across various domains. Traditionally, approaches to STR have relied on knowledge-based and statistical methods. However, with the emergence of Large Language Models, there has been a paradigm shift, ushering in new methodologies. In this paper, we delve into the investigation of sentence-level STR within Track A (Supervised) by leveraging fine-tuning techniques on the RoBERTa transformer. Our study focuses on assessing the efficacy of this approach across different languages. Notably, our findings indicate promising advancements in STR performance, particularly in Latin languages. Specifically, our results demonstrate notable improvements in English, achieving a correlation of 0.82 and securing a commendable 19th rank. Similarly, in Spanish, we achieved a correlation of 0.67, securing the 15th position. However, our approach encounters challenges in languages like Arabic, where we observed a correlation of only 0.38, resulting in a 20th rank.
Paper Structure (20 sections, 1 equation, 9 figures, 4 tables)

This paper contains 20 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Scatter Plots of English, Arabic and Spanish Languages
  • Figure 2: The Confusion Matrix Plot of English, Arabic and Spanish Languages
  • Figure 3: Output of Afrikaans
  • Figure 4: Output of Amharic Language
  • Figure 5: Output of Modern Standard Arabic
  • ...and 4 more figures