Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

Taichi Murayama; Kunihiro Miyazaki; Yasuko Matsubara; Yasushi Sakurai

Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

Taichi Murayama, Kunihiro Miyazaki, Yasuko Matsubara, Yasushi Sakurai

TL;DR

This study analyzes a global, multilingual Twitter dataset to map public perceptions and usage patterns of generative AI tools across 14 languages, using sentiment analysis, odds-ratio and topic modeling (BERTopic), and open coding to build a chatbot-usage taxonomy. It reveals a consistent global preference for image-generation tools and caution toward chat-based tools, while language-specific nuances emerge (e.g., Chinese users as search substitutes; Italian users for creative tasks). The authors introduce a robust analytical pipeline, including language-specific sentiment models, an Interest Intensity index, cross-language topic comparisons, and a detailed chatbot-usage taxonomy with high annotation reliability, yielding actionable insights for policy, education, and technology design. The work highlights both universal trends and cultural particularities in generative-AI discourse, and furnishes a foundation for ongoing cross-linguistic and cross-platform investigations in human-AI interaction.

Abstract

The advent of generative AI tools has had a profound impact on societies globally, transcending geographical boundaries. Understanding these tools' global reception and utilization is crucial for service providers and policymakers in shaping future policies. Therefore, to unravel the perceptions and engagements of individuals within diverse linguistic communities with regard to generative AI tools, we extensively analyzed over 6.8 million tweets in 14 different languages. Our findings reveal a global trend in the perception of generative AI, accompanied by language-specific nuances. While sentiments toward these tools vary significantly across languages, there is a prevalent positive inclination toward Image tools and a negative one toward Chat tools. Notably, the ban of ChatGPT in Italy led to a sentiment decline and initiated discussions across languages. Furthermore, we established a taxonomy for interactions with chatbots, creating a framework for social analysis underscoring variations in generative AI usage among linguistic communities. We find that the Chinese community predominantly employs chatbots as substitutes for search, while the Italian community tends to use chatbots for tasks such as problem-solving assistance and engaging in entertainment or creative tasks. Our research provides a robust foundation for further explorations of the social dynamics surrounding generative AI tools and offers invaluable insights for decision-makers in policy, technology, and education.

Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

TL;DR

Abstract

Paper Structure (43 sections, 1 equation, 19 figures, 14 tables)

This paper contains 43 sections, 1 equation, 19 figures, 14 tables.

Introduction
Related Work
Dataset
Target of Generative AI Tools
Collecting of generative AI tweet
Tweet data observations
RQ1: How do the sentiments toward generative AI vary across different languages?
Sentiment classification model
Results and Findings
RQ2: How do linguistic communities differ in the content about generative AI tools?
Odds ratio analysis
Topic modeling
RQ3: How do people interact with chatbots?
Extraction interactions from images
Topic modeling
...and 28 more sections

Figures (19)

Figure 1: Time series of sentiment scores for five languages: en, ja, es, fr, and it from October 2022 to May 2, 2023. The x-axis represents time, and the y-axis represents the sentiment score. The sentiment scores for generative AI tools are displayed as follows: the blue line represents Chat tool, orange represents Image tool, and green represents Model/Code tool. The solid gray line indicates the daily sentiment score, while the gray dotted lines represent the 75th and 25th percentiles of daily sentiment scores calculated from random tweets. The vertical red dotted line marks the date when Italy banned access to ChatGPT. We focus on the period from October 2022 onward due to the low volume of tweets and instability in sentiment data prior to this period, which made it difficult to visualize reliable trends across some languages.
Figure 2: Top 10 words more likely to be used by odds ratio, except generative AI tool' names and seasonal words, in six languages. Red indicates words that appear in multiple languages and blue indicates words unique to a specific language.
Figure 3: Temporal evolution of chatbot usage across different categories. The vertical red line indicates the release date of GPT-4. The x-axis represents time, and the y-axis represents number or ratio of interactions in each category.
Figure 4: List of search keywords of generative AI tools
Figure 5: Time series of tweets about generative AI tools
...and 14 more figures

Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

TL;DR

Abstract

Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (19)