Table of Contents
Fetching ...

Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence

Kourosh Shahnazari, Seyed Moein Ayyoubzadeh

TL;DR

The paper tackles implicit demographic and personality inference from Telegram conversations using Transformer-based models, primarily RoBERTa. By fine-tuning on large, implicitly labeled datasets, it demonstrates strong MBTI and gender classification capabilities, achieving up to 86.16% MBTI accuracy under high-confidence constraints and 74.4% gender accuracy, while highlighting significant coverage trade-offs. The work analyzes lineage of Transformer architectures, dataset preprocessing, and confidence-threshold tuning to balance precision and applicability in real-world conversational settings. It also discusses ethical considerations and practical implications for privacy, bias, and responsible use of implicit psychological profiling in personalized technologies.

Abstract

In personalized technology and psychological research, precisely detecting demographic features and personality traits from digital interactions becomes ever more important. This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data, while conventional personality prediction techniques mostly depend on explicitly self-reported labels. We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences using a dataset comprising 138,866 messages from 1,602 users annotated with MBTI types and 195,016 messages from 2,598 users annotated with gender. Confidence levels help to greatly raise model accuracy to 86.16\%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data. Our results highlight the usefulness of Transformer topologies for implicit personality and gender classification, hence stressing their efficiency and stressing important trade-offs between accuracy and coverage in realistic conversational environments. With regard to gender classification, the model obtained an accuracy of 74.4\%, therefore capturing gender-specific language patterns. Personality dimension analysis showed that people with introverted and intuitive preferences are especially more active in text-based interactions. This study emphasizes practical issues in balancing accuracy and data coverage as Transformer-based models show their efficiency in implicit personality and gender prediction tasks from conversational texts.

Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence

TL;DR

The paper tackles implicit demographic and personality inference from Telegram conversations using Transformer-based models, primarily RoBERTa. By fine-tuning on large, implicitly labeled datasets, it demonstrates strong MBTI and gender classification capabilities, achieving up to 86.16% MBTI accuracy under high-confidence constraints and 74.4% gender accuracy, while highlighting significant coverage trade-offs. The work analyzes lineage of Transformer architectures, dataset preprocessing, and confidence-threshold tuning to balance precision and applicability in real-world conversational settings. It also discusses ethical considerations and practical implications for privacy, bias, and responsible use of implicit psychological profiling in personalized technologies.

Abstract

In personalized technology and psychological research, precisely detecting demographic features and personality traits from digital interactions becomes ever more important. This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data, while conventional personality prediction techniques mostly depend on explicitly self-reported labels. We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences using a dataset comprising 138,866 messages from 1,602 users annotated with MBTI types and 195,016 messages from 2,598 users annotated with gender. Confidence levels help to greatly raise model accuracy to 86.16\%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data. Our results highlight the usefulness of Transformer topologies for implicit personality and gender classification, hence stressing their efficiency and stressing important trade-offs between accuracy and coverage in realistic conversational environments. With regard to gender classification, the model obtained an accuracy of 74.4\%, therefore capturing gender-specific language patterns. Personality dimension analysis showed that people with introverted and intuitive preferences are especially more active in text-based interactions. This study emphasizes practical issues in balancing accuracy and data coverage as Transformer-based models show their efficiency in implicit personality and gender prediction tasks from conversational texts.

Paper Structure

This paper contains 46 sections, 11 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Distribution of messages across MBTI types and gender categories in the dataset.
  • Figure 2: Distribution of MBTI personality dichotomies within the dataset. Each subplot represents the proportion of messages associated with a specific MBTI feature.
  • Figure 3: Percentage distribution of genders in the dataset. The chart represents the proportion of messages attributed to male and female users.
  • Figure 4: Confusion matrix for RoBERTa on MBTI personality classification.
  • Figure 5: Confusion matrix for RoBERTa on gender classification.