Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

Oana Ignat; Gayathri Ganesh Lakshmy; Rada Mihalcea

Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea

TL;DR

Cross-cultural inspiration detection and generation is addressed by building InspAIred, a dataset of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 GPT-4 generated inspiring posts distributed across India and the UK. The authors perform linguistic analyses (stylistic, semantic, LIWC) and topic modeling to compare content across cultures and to contrast AI-generated with human-authored posts, while evaluating detection performance with RF-TF-IDF, XLM-RoBERTa, and LoRA-Llama setups. They demonstrate high cross-cultural discrimination accuracy, including in few-shot regimes, and provide substantial qualitative and quantitative insights into how inspiration manifests differently across cultures and data sources. The public InspAIred dataset and baselines offer a resource for advancing cross-cultural NLP research on motivation, creativity, and content generation with practical implications for education, health, and media applications.

Abstract

Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first to study cross-cultural inspiration through machine learning methods. We aim to identify and analyze real and AI-generated cross-cultural inspiring posts. To this end, we compile and make publicly available the InspAIred dataset, which consists of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 generated inspiring posts evenly distributed across India and the UK. The real posts are sourced from Reddit, while the generated posts are created using the GPT-4 model. Using this dataset, we conduct extensive computational linguistic analyses to (1) compare inspiring content across cultures, (2) compare AI-generated inspiring posts to real inspiring posts, and (3) determine if detection models can accurately distinguish between inspiring content across cultures and data sources.

Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

TL;DR

Abstract

Paper Structure (38 sections, 11 figures, 4 tables)

This paper contains 38 sections, 11 figures, 4 tables.

Introduction
Related Work
Automatic Inspiration Detection.
Human vs. LLM-generated Cross-cultural Text.
Computational Linguistics for Social Media Analysis.
The InspAIred Dataset
Real Inspiring Content
Data Collection.
Data Filtering.
Data Annotation.
Quality Assurance.
Data Statistics.
LLM-Generated Inspiring Content
Prompt Design and Robustness
System Prompt.
...and 23 more sections

Figures (11)

Figure 1: We compare AI-generated and human-written inspiring Reddit content across India and the UK. Although it is challenging for a person to distinguish between them, we find significant linguistic cross-cultural differences between generated and real inspiring posts.
Figure 2: Annotation guidelines for labeling inspiration.
Figure 3: Visualization of topics used in the real and generated ( vs. ) inspiring posts from the UK. Points are colored red or blue based on the association of their corresponding terms with UK Real inspiring posts or UK LLM-Generated inspiring posts. The most associated topics are listed under Top Generated and Top Real headings. Interactive version: https://github.com/MichiganNLP/cross_inspiration.
Figure 4: Classification test accuracy with the few-shot and default setups with the Random Forest TF-IDF (RF), XLM-RoBERTa base (RB), and Llama 2.7b (LL) models.
Figure 5: Scattertext visualization of unigrams used in the real inspiring and non-inspiring ( vs. ✗) Reddit posts from India. Points are colored in red or blue based on the association of their corresponding terms with Indian Non-inspiring posts or Indian inspiring posts. The most associated terms are listed under "Top inspiring" and "Top Non-inspiring" headings.
...and 6 more figures

Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

TL;DR

Abstract

Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

Authors

TL;DR

Abstract

Table of Contents

Figures (11)