Table of Contents
Fetching ...

Emotion Alignment: Discovering the Gap Between Social Media and Real-World Sentiments in Persian Tweets and Images

Sina Elahimanesh, Mohammadali Mohammadkhani, Shohreh Kasaei

TL;DR

The paper addresses how Persian-speaking users express emotions differently in real life versus on social media. It introduces an 11-stage pipeline that fuses Transformer-based text sentiment analysis with image-based facial expression analysis, augmented by human inputs from participants' friends, to quantify cross-environment emotion similarity using an Earth Mover's Distance metric. A new Persian X dataset (~3300 tweets) with five sentiment labels is collected and a hybrid text classifier ( ParsBERT + LaBSE with a rule-based fallback) achieves 74.08% accuracy on tweets; real-world vs tweets show 75.88% alignment, while real-world vs images shows 28.67% alignment, with all pairwise modality comparisons statistically significant. A web visualization and qualitative feedback from participants (≈93% satisfaction) demonstrate the approach's utility and privacy-conscious design, offering a framework for multimodal, human-informed emotion analysis across platforms and languages.

Abstract

In contemporary society, widespread social media usage is evident in people's daily lives. Nevertheless, disparities in emotional expressions between the real world and online platforms can manifest. We comprehensively analyzed Persian community on X to explore this phenomenon. An innovative pipeline was designed to measure the similarity between emotions in the real world compared to social media. Accordingly, recent tweets and images of participants were gathered and analyzed using Transformers-based text and image sentiment analysis modules. Each participant's friends also provided insights into the their real-world emotions. A distance criterion was used to compare real-world feelings with virtual experiences. Our study encompassed N=105 participants, 393 friends who contributed their perspectives, over 8,300 collected tweets, and 2,000 media images. Results indicated a 28.67% similarity between images and real-world emotions, while tweets exhibited a 75.88% alignment with real-world feelings. Additionally, the statistical significance confirmed that the observed disparities in sentiment proportions.

Emotion Alignment: Discovering the Gap Between Social Media and Real-World Sentiments in Persian Tweets and Images

TL;DR

The paper addresses how Persian-speaking users express emotions differently in real life versus on social media. It introduces an 11-stage pipeline that fuses Transformer-based text sentiment analysis with image-based facial expression analysis, augmented by human inputs from participants' friends, to quantify cross-environment emotion similarity using an Earth Mover's Distance metric. A new Persian X dataset (~3300 tweets) with five sentiment labels is collected and a hybrid text classifier ( ParsBERT + LaBSE with a rule-based fallback) achieves 74.08% accuracy on tweets; real-world vs tweets show 75.88% alignment, while real-world vs images shows 28.67% alignment, with all pairwise modality comparisons statistically significant. A web visualization and qualitative feedback from participants (≈93% satisfaction) demonstrate the approach's utility and privacy-conscious design, offering a framework for multimodal, human-informed emotion analysis across platforms and languages.

Abstract

In contemporary society, widespread social media usage is evident in people's daily lives. Nevertheless, disparities in emotional expressions between the real world and online platforms can manifest. We comprehensively analyzed Persian community on X to explore this phenomenon. An innovative pipeline was designed to measure the similarity between emotions in the real world compared to social media. Accordingly, recent tweets and images of participants were gathered and analyzed using Transformers-based text and image sentiment analysis modules. Each participant's friends also provided insights into the their real-world emotions. A distance criterion was used to compare real-world feelings with virtual experiences. Our study encompassed N=105 participants, 393 friends who contributed their perspectives, over 8,300 collected tweets, and 2,000 media images. Results indicated a 28.67% similarity between images and real-world emotions, while tweets exhibited a 75.88% alignment with real-world feelings. Additionally, the statistical significance confirmed that the observed disparities in sentiment proportions.

Paper Structure

This paper contains 26 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the experimental pipeline for analyzing participants' online and offline emotional perceptions. The study is initiated by advertising the experiment on social media platforms such as X platform (Step 1), followed by collecting contact information of participants’ friends through a preliminary Google Form (Step 2). The system then gathers visual and textual data, including the participant’s profile picture and 20 recent media images (Step 3), and 50 recent tweets (Step 6). Face recognition (Step 4) and facial expression analysis (Step 5) modules, powered by Hugging Face models, are applied to the images. In parallel, a sentiment analysis module processes the textual content of tweets (Step 7). A separate questionnaire is sent to the participant’s friends to capture their perception of the participant’s real-world sentiments (Step 8), which is then averaged (Step 9). The system calculates the distance between the sentiment distributions derived from online data and friends’ responses (Step 10). Finally, the results are visualized on a dedicated website and participants are invited to provide feedback (Step 11).
  • Figure 2: A screenshot of 10 tweets from the gathered dataset is shown here.
  • Figure 3: Architecture of the final hybrid sentiment classification model. The model takes a Persian input sentence, generates two sets of contextual embeddings using ParsBERT and LaBSE models, and concatenates the generated embeddings into a single vector. The final embedding vector is passed through a series of fully connected layers and non-linear activation functions (e.g., ReLU and Leaky ReLU) to produce a probability distribution over five sentiment classes. In parallel, a rule-based model based on keyword detection attempts to predict sentiment. The final output is selected based on a decision rule: if the neural model’s confidence is $\geq 80$ or the rule-based system cannot make a prediction, the model’s output is used; otherwise, the rule-based result is chosen.
  • Figure 4: Overview of the website for the experimental process for assessing the our results in a post-survey study. The left panel (Step 1: Login) depicts the user authentication interface, where participants input their ID on X platform and a secret key to grant access to their social media data (which were given to them as a password). The right panel (Step 2: See the results of the experiment) displays the outcome of the analysis through visualizations comparing emotional expressions and personality features across three dimensions: participants’ tweets, uploaded images, and their friends’ content. The results include donut charts illustrating the distribution of emotional tones (e.g., happy, sad, neutral, angry) and bar charts that quantify the similarity between the user’s real-world personality and their social media expressions. Each similarity score is presented on a scale from 0 to 100, highlighting how consistently individuals portray themselves across different social media modalities.