Table of Contents
Fetching ...

Analyzing Gender Polarity in Short Social Media Texts with BERT: The Role of Emojis and Emoticons

Saba Yousefian Jazi, Amir Mirzaeinia, Sina Yousefian Jazi

TL;DR

This work tackles the problem of inferring an author’s gender from short Twitter texts by augmenting a BERT-based classifier with nonword cues such as emojis, emoticons, and user mentions. It details a fine-tuning setup for a BERT-base-uncased model with a sigmoid head, dropout of $0.1\%$, and learning rate $2e-5$ across $10$ epochs on a Tesla T4, using a dataset of 50/50 gender-balanced celebrity tweets after emoji-to-text replacements. Experiments reveal that mentions and emoji substitutions significantly influence gender-polarity predictions and that sentiment signals partially explain some misclassifications, though data biases and incomplete emoji coverage limit conclusions. The findings emphasize stylometric cues in short-text gender profiling and suggest improving robustness through broader, more diverse data and more comprehensive emoji handling.

Abstract

In this effort we fine tuned different models based on BERT to detect the gender polarity of twitter accounts. We specially focused on analyzing the effect of using emojis and emoticons in performance of our model in classifying task. We were able to demonstrate that the use of these none word inputs alongside the mention of other accounts in a short text format like tweet has an impact in detecting the account holder's gender.

Analyzing Gender Polarity in Short Social Media Texts with BERT: The Role of Emojis and Emoticons

TL;DR

This work tackles the problem of inferring an author’s gender from short Twitter texts by augmenting a BERT-based classifier with nonword cues such as emojis, emoticons, and user mentions. It details a fine-tuning setup for a BERT-base-uncased model with a sigmoid head, dropout of , and learning rate across epochs on a Tesla T4, using a dataset of 50/50 gender-balanced celebrity tweets after emoji-to-text replacements. Experiments reveal that mentions and emoji substitutions significantly influence gender-polarity predictions and that sentiment signals partially explain some misclassifications, though data biases and incomplete emoji coverage limit conclusions. The findings emphasize stylometric cues in short-text gender profiling and suggest improving robustness through broader, more diverse data and more comprehensive emoji handling.

Abstract

In this effort we fine tuned different models based on BERT to detect the gender polarity of twitter accounts. We specially focused on analyzing the effect of using emojis and emoticons in performance of our model in classifying task. We were able to demonstrate that the use of these none word inputs alongside the mention of other accounts in a short text format like tweet has an impact in detecting the account holder's gender.
Paper Structure (9 sections, 2 figures, 3 tables)

This paper contains 9 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Confusion matrix of original experiment: replacing the emojis with text and including mentions.
  • Figure 2: Confusion matrix of second experiment: replacing the emojis with text and removing mentions.