Analyzing Gender Polarity in Short Social Media Texts with BERT: The Role of Emojis and Emoticons
Saba Yousefian Jazi, Amir Mirzaeinia, Sina Yousefian Jazi
TL;DR
This work tackles the problem of inferring an author’s gender from short Twitter texts by augmenting a BERT-based classifier with nonword cues such as emojis, emoticons, and user mentions. It details a fine-tuning setup for a BERT-base-uncased model with a sigmoid head, dropout of $0.1\%$, and learning rate $2e-5$ across $10$ epochs on a Tesla T4, using a dataset of 50/50 gender-balanced celebrity tweets after emoji-to-text replacements. Experiments reveal that mentions and emoji substitutions significantly influence gender-polarity predictions and that sentiment signals partially explain some misclassifications, though data biases and incomplete emoji coverage limit conclusions. The findings emphasize stylometric cues in short-text gender profiling and suggest improving robustness through broader, more diverse data and more comprehensive emoji handling.
Abstract
In this effort we fine tuned different models based on BERT to detect the gender polarity of twitter accounts. We specially focused on analyzing the effect of using emojis and emoticons in performance of our model in classifying task. We were able to demonstrate that the use of these none word inputs alongside the mention of other accounts in a short text format like tweet has an impact in detecting the account holder's gender.
