Consistency of Responses and Continuations Generated by Large Language Models on Social Media

Wentao Xu; Wenlu Fan; Yuqi Zhu; Bin Wang

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

Wentao Xu, Wenlu Fan, Yuqi Zhu, Bin Wang

TL;DR

This work investigates how large language models manage emotion and semantic relationships in social-media contexts, focusing on climate-change discussions from Twitter and Reddit. It compares four models (Gemma, Llama3, Llama3.3, Claude) across continuation and response tasks, using emotion labeling and a LLM-as-judge framework to assess semantic fidelity to source posts. The findings show that LLMs tend to moderate negative emotions and produce semantically coherent outputs, though emotional intensity is generally lower than human-authored text and patterns vary by model and task. The results inform the design and deployment of emotion-aware AI in social media, highlighting both benefits for defusing polarization and risks of manipulating emotional dynamics.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using three open-source models: Gemma, Llama3 and Llama3.3 and one commercial Model:Claude. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic consistency between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: these models show a strong tendency to moderate negative emotions. When the input text carries negative emotions such as anger, disgust, fear, or sadness, LLM tends to generate content with more neutral emotions, or even convert them into positive emotions such as joy or surprise. At the same time, we compared the LLM-generated content with human-authored content. The four models systematically generated responses with reduced emotional intensity and showed a preference for neutral rational emotions in the response task. In addition, these models all maintained a high semantic similarity with the original text, although their performance in the continuation task and the response task was different. These findings provide deep insights into the emotion and semantic processing capabilities of LLM, which are of great significance for its deployment in social media environments and human-computer interaction design.

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

TL;DR

Abstract

Paper Structure (17 sections, 8 figures, 1 table)

This paper contains 17 sections, 8 figures, 1 table.

Introduction
Related Works
Evaluation of LLMs generated text
Text generation on social media context
Methodology
Experimental Design
Dataset
Emotion Labeling
Semantic Consistency
Results
Emotion Dynamics of the Original Text in Downstream Tasks
Resources of LLMs' Generated Content Emotions
Comparative Analysis of Emotional Intensity between LLMs and Human Text
Evaluating Semantic Consistency of LLM-Generated Content in Social Media Contexts
Discussion
...and 2 more sections

Figures (8)

Figure 1: Experimental pipeline of consistency evaluation for LLMs. Our experimental framework begins with human text input to four LLMs , which perform two distinct tasks: continuation and response. The continuation task employs a specific prompt instructing the model to expand the text as its author, while the response task operates without explicit prompting to enable natural interaction. Following content generation, we implement emotion detection on the outputs, followed by comprehensive analyses. The framework concludes with parallel analyses of emotional content and semantic consistency to evaluate the consistency of LLM-generated content relative to the original human input.
Figure 2: Daily data amount of Twitter and Reddit. a. Daily comments count of Reddit. b. Daily tweets count of Twitter. The x-axis represents the date, and the y-axis represents the frequency.
Figure 3: Emotional Transition Analysis of LLM Response and Continuation Tasks in Reddit Comments. Panels a, b, c, d, e, f, g, and h illustrate emotional transitions in content generated by Gemma, Llama and Claude models during continuation and response tasks, respectively. The y-axis represents source emotions from human text, while the x-axis indicates emotions in LLM-generated content. Cell values represent the proportion of emotional transitions between original and generated content. For example, in Figure 3a, the value 0.34 in the anger-to-anger cell indicates that 34% of originally angry texts maintained their emotional valence in Gemma's continuation task. The intensity of each cell's shading represents the proportion of emotional transition, with darker shades indicating higher transition frequencies.
Figure 4: Emotional Transition Analysis of LLM Response and Continuation Tasks in Twitter Comments. Panels a, b, c, de, f, g, and h illustrate emotional transitions in content generated by Gemma, Llama and Claude models during continuation and response tasks on Twitter, respectively. The y-axis represents the original emotions in human-authored tweets, while the x-axis shows the emotions detected in LLM-generated content. Each cell value represents the proportion of emotional transitions, with darker shades of red indicating higher transition frequencies.
Figure 5: Emotional source analysis of LLM-generated content across platforms. Panels a--n illustrate emotional transitions in Gemma, Llama and Claude models' continuation and response tasks on Reddit and Twitter data. The red bars indicate that the original text corresponding to the generated text is positive emotions such as joy and surprise, the blue bars indicate that the original text is negative emotions such as anger, disgust, fear and sadness, and the green bars indicate that it comes from text with neutral emotions. The y-axis displays the emotional categories present in both original and generated content.
...and 3 more figures

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

TL;DR

Abstract

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

Authors

TL;DR

Abstract

Table of Contents

Figures (8)