MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments

Dhiman Goswami; Sadiya Sayara Chowdhury Puspo; Md Nishat Raihan; Al Nahian Bin Emran

MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments

Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Al Nahian Bin Emran

TL;DR

This work addresses multilingual detection of homophobia and transphobia in online comments across ten languages by combining monolingual transformers and ensemble methods, with a special prompting approach for the low-resource language Tulu. It integrates XLM-R, language-specific BERTs, and a few-shot GPT-3.5 prompting strategy into a weighted ensemble, achieving top-ranked performance in several languages and highlighting the benefits of model diversity. The analysis reveals that imbalanced label distributions largely drive macro $F1$ variability, motivating future efforts toward larger, more balanced datasets and robust handling of minority classes. Overall, the study demonstrates the practical viability of multilingual ensembles for online safety tasks, while acknowledging ethical considerations and deployment challenges.

Abstract

In this paper, we describe our approaches and results for Task 2 of the LT-EDI 2024 Workshop, aimed at detecting homophobia and/or transphobia across ten languages. Our methodologies include monolingual transformers and ensemble methods, capitalizing on the strengths of each to enhance the performance of the models. The ensemble models worked well, placing our team, MasonTigers, in the top five for eight of the ten languages, as measured by the macro F1 score. Our work emphasizes the efficacy of ensemble methods in multilingual scenarios, addressing the complexities of language-specific tasks.

MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments

TL;DR

variability, motivating future efforts toward larger, more balanced datasets and robust handling of minority classes. Overall, the study demonstrates the practical viability of multilingual ensembles for online safety tasks, while acknowledging ethical considerations and deployment challenges.

Abstract

Paper Structure (8 sections, 11 figures, 3 tables)

This paper contains 8 sections, 11 figures, 3 tables.

Introduction
Related Works
Datasets
Experiments
Results
Error Analysis
Conclusion
Confusion Matrix

Figures (11)

Figure 1: Sample GPT-3.5 prompt for few shot learning [Used for the Tulu Dataset].
Figure 2: Confusion Matrix for Tamil Language
Figure 3: Confusion Matrix for English Language
Figure 4: Confusion Matrix for Malayalam Language
Figure 5: Confusion Matrix for Marathi Language
...and 6 more figures

MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments

TL;DR

Abstract

MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments

Authors

TL;DR

Abstract

Table of Contents

Figures (11)