MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic Textual Relatedness
Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Al Nahian Bin Emran, Amrita Ganguly, Marcos Zampieri
TL;DR
This work tackles semantic textual relatedness across 14 languages in SemEval-2024 Task 1 by building an ensemble framework that fuses statistical embeddings (TF-IDF, PPMI), language-specific BERT models, and sentence transformers. Predictions are calibrated with regression heads (ElasticNet and Linear Regression) and combined through development-guided weighting, achieving improvements over individual models across supervised, unsupervised, and cross-lingual tracks. The results reveal language-dependent strengths, with English, Kinyarwanda, and Punjabi illustrating the spectrum of performance, and underscore the value of language-adaptive representations under data constraints. Overall, the study demonstrates that ensemble approaches leveraging multilingual embeddings can enhance STR estimation in realistic, multilingual settings $\left([0,1]\right)$ with Spearman correlation as the evaluation metric $\rho$.
Abstract
This paper presents the MasonTigers entry to the SemEval-2024 Task 1 - Semantic Textual Relatedness. The task encompasses supervised (Track A), unsupervised (Track B), and cross-lingual (Track C) approaches across 14 different languages. MasonTigers stands out as one of the two teams who participated in all languages across the three tracks. Our approaches achieved rankings ranging from 11th to 21st in Track A, from 1st to 8th in Track B, and from 5th to 12th in Track C. Adhering to the task-specific constraints, our best performing approaches utilize ensemble of statistical machine learning approaches combined with language-specific BERT based models and sentence transformers.
