Consistency of Responses and Continuations Generated by Large Language Models on Social Media
Wentao Xu, Wenlu Fan, Yuqi Zhu, Bin Wang
TL;DR
This work investigates how large language models manage emotion and semantic relationships in social-media contexts, focusing on climate-change discussions from Twitter and Reddit. It compares four models (Gemma, Llama3, Llama3.3, Claude) across continuation and response tasks, using emotion labeling and a LLM-as-judge framework to assess semantic fidelity to source posts. The findings show that LLMs tend to moderate negative emotions and produce semantically coherent outputs, though emotional intensity is generally lower than human-authored text and patterns vary by model and task. The results inform the design and deployment of emotion-aware AI in social media, highlighting both benefits for defusing polarization and risks of manipulating emotional dynamics.
Abstract
Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using three open-source models: Gemma, Llama3 and Llama3.3 and one commercial Model:Claude. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic consistency between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: these models show a strong tendency to moderate negative emotions. When the input text carries negative emotions such as anger, disgust, fear, or sadness, LLM tends to generate content with more neutral emotions, or even convert them into positive emotions such as joy or surprise. At the same time, we compared the LLM-generated content with human-authored content. The four models systematically generated responses with reduced emotional intensity and showed a preference for neutral rational emotions in the response task. In addition, these models all maintained a high semantic similarity with the original text, although their performance in the continuation task and the response task was different. These findings provide deep insights into the emotion and semantic processing capabilities of LLM, which are of great significance for its deployment in social media environments and human-computer interaction design.
