Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data
Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea
TL;DR
Cross-cultural inspiration detection and generation is addressed by building InspAIred, a dataset of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 GPT-4 generated inspiring posts distributed across India and the UK. The authors perform linguistic analyses (stylistic, semantic, LIWC) and topic modeling to compare content across cultures and to contrast AI-generated with human-authored posts, while evaluating detection performance with RF-TF-IDF, XLM-RoBERTa, and LoRA-Llama setups. They demonstrate high cross-cultural discrimination accuracy, including in few-shot regimes, and provide substantial qualitative and quantitative insights into how inspiration manifests differently across cultures and data sources. The public InspAIred dataset and baselines offer a resource for advancing cross-cultural NLP research on motivation, creativity, and content generation with practical implications for education, health, and media applications.
Abstract
Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first to study cross-cultural inspiration through machine learning methods. We aim to identify and analyze real and AI-generated cross-cultural inspiring posts. To this end, we compile and make publicly available the InspAIred dataset, which consists of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 generated inspiring posts evenly distributed across India and the UK. The real posts are sourced from Reddit, while the generated posts are created using the GPT-4 model. Using this dataset, we conduct extensive computational linguistic analyses to (1) compare inspiring content across cultures, (2) compare AI-generated inspiring posts to real inspiring posts, and (3) determine if detection models can accurately distinguish between inspiring content across cultures and data sources.
