Table of Contents
Fetching ...

@GrokSet: multi-party Human-LLM Interactions in Social Media

Matteo Migliarini, Berat Ercevik, Oluwagbemike Olowe, Saira Fatima, Sarah Zhao, Minh Anh Le, Vasu Sharma, Ashwinee Panda

Abstract

Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these unconstrained social environments remains largely unstudied. Existing datasets, drawn primarily from private chat interfaces, lack the multi-party dynamics and public visibility crucial for understanding real-world performance. To address this gap, we introduce @GrokSet, a large-scale dataset of over 1 million tweets involving the @Grok LLM on X. Our analysis reveals a distinct functional shift: rather than serving as a general assistant, the LLM is frequently invoked as an authoritative arbiter in high-stakes, polarizing political debates. However, we observe a persistent engagement gap: despite this visibility, the model functions as a low-status utility, receiving significantly less social validation (likes, replies) than human peers. Finally, we find that this adversarial context exposes shallow alignment: users bypass safety filters not through complex jailbreaks, but through simple persona adoption and tone mirroring. We release @GrokSet as a critical resource for studying the intersection of AI agents and societal discourse.

@GrokSet: multi-party Human-LLM Interactions in Social Media

Abstract

Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these unconstrained social environments remains largely unstudied. Existing datasets, drawn primarily from private chat interfaces, lack the multi-party dynamics and public visibility crucial for understanding real-world performance. To address this gap, we introduce @GrokSet, a large-scale dataset of over 1 million tweets involving the @Grok LLM on X. Our analysis reveals a distinct functional shift: rather than serving as a general assistant, the LLM is frequently invoked as an authoritative arbiter in high-stakes, polarizing political debates. However, we observe a persistent engagement gap: despite this visibility, the model functions as a low-status utility, receiving significantly less social validation (likes, replies) than human peers. Finally, we find that this adversarial context exposes shallow alignment: users bypass safety filters not through complex jailbreaks, but through simple persona adoption and tone mirroring. We release @GrokSet as a critical resource for studying the intersection of AI agents and societal discourse.
Paper Structure (37 sections, 1 equation, 25 figures, 6 tables)

This paper contains 37 sections, 1 equation, 25 figures, 6 tables.

Figures (25)

  • Figure 1: The User probes X's enforcement of Turkish content restrictions against opposition voices.
  • Figure 2: Key statistics of the @grokSet dataset, showing (a) the number of turns per conversation and (b) the distribution of languages across all tweets.
  • Figure 3: Network metrics for @grokSet. The distribution exhibits a heavy tail: while most conversations adhere to a simple linear structure (low transitivity), a consistent subset displays high structural interconnectivity (right).
  • Figure 4: Transitivity relative to participant count. While the majority of interactions exhibit zero transitivity (indicating linear or star-shaped graphs), a dense cluster of small-group interactions (top left) demonstrates high social cohesion.
  • Figure 5: t-SNE visualization of conversation-level embeddings, representing the 10 most frequent from 1,112 discovered topics.
  • ...and 20 more figures