Table of Contents
Fetching ...

Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media

Kun Li, Chenwei Dai, Wei Zhou, Songlin Hu

TL;DR

This work introduces FineRob, a multilingual Fine-Grained Behavior dataset collected from 1,866 real social-media users across Twitter, Reddit, and Zhihu to study how LLMs simulate user actions at the object, type, and content levels. It reveals two principal reasoning patterns in LLMs: role stereotype-based and observation-memory-based, finding the latter more accurate for behavior simulation. To enhance LLM reasoning, the authors propose OM-CoT, a fine-tuning approach that explicitly integrates observation and memory analysis through special tokens <ANA> and <MEM>, Oracle CoT generation, and enhanced supervised fine-tuning. Across nine mainstream LLMs, OM-CoT-FT yields consistent performance gains, though commercial models still outperform open-source ones, and very short-behavior tasks remain challenging; collectively, this work advances fine-grained behavioral modeling in real-world social-media contexts and offers practical insights for robust, interpretable behavior simulation.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in role-playing tasks. However, there is limited research on whether LLMs can accurately simulate user behavior in real-world scenarios, such as social media. This requires models to effectively analyze a user's history and simulate their role. In this paper, we introduce \textbf{FineRob}, a novel fine-grained behavior simulation dataset. We collect the complete behavioral history of 1,866 distinct users across three social media platforms. Each behavior is decomposed into three fine-grained elements: object, type, and content, resulting in 78.6k QA records. Based on FineRob, we identify two dominant reasoning patterns in LLMs' behavior simulation processes and propose the \textbf{OM-CoT} fine-tuning method to enhance the capability. Through comprehensive experiments, we conduct an in-depth analysis of key factors of behavior simulation and also demonstrate the effectiveness of OM-CoT approach\footnote{Code and dataset are available at \url{https://github.com/linkseed18612254945/FineRob}}

Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media

TL;DR

This work introduces FineRob, a multilingual Fine-Grained Behavior dataset collected from 1,866 real social-media users across Twitter, Reddit, and Zhihu to study how LLMs simulate user actions at the object, type, and content levels. It reveals two principal reasoning patterns in LLMs: role stereotype-based and observation-memory-based, finding the latter more accurate for behavior simulation. To enhance LLM reasoning, the authors propose OM-CoT, a fine-tuning approach that explicitly integrates observation and memory analysis through special tokens <ANA> and <MEM>, Oracle CoT generation, and enhanced supervised fine-tuning. Across nine mainstream LLMs, OM-CoT-FT yields consistent performance gains, though commercial models still outperform open-source ones, and very short-behavior tasks remain challenging; collectively, this work advances fine-grained behavioral modeling in real-world social-media contexts and offers practical insights for robust, interpretable behavior simulation.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in role-playing tasks. However, there is limited research on whether LLMs can accurately simulate user behavior in real-world scenarios, such as social media. This requires models to effectively analyze a user's history and simulate their role. In this paper, we introduce \textbf{FineRob}, a novel fine-grained behavior simulation dataset. We collect the complete behavioral history of 1,866 distinct users across three social media platforms. Each behavior is decomposed into three fine-grained elements: object, type, and content, resulting in 78.6k QA records. Based on FineRob, we identify two dominant reasoning patterns in LLMs' behavior simulation processes and propose the \textbf{OM-CoT} fine-tuning method to enhance the capability. Through comprehensive experiments, we conduct an in-depth analysis of key factors of behavior simulation and also demonstrate the effectiveness of OM-CoT approach\footnote{Code and dataset are available at \url{https://github.com/linkseed18612254945/FineRob}}

Paper Structure

This paper contains 32 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: An example of FineRob, requires LLM to simulat behavior choices that align with a role's profile and historical data. We decompose a complete behavior record into three fine-grained components: selecting the recipient of the action, determining the action type, and specifying the behavior details.
  • Figure 2: Overview of our work, The left and middle sections of the figure illustrate the process of constructing the FineRob dataset. The right section shows how OM-COT-FineTune training details, including data augmentation, reorganize with special tokens and SFT training.
  • Figure 3: Two typical patterns of COT reasoning for behavior simulation. The "Role Stereotype" pattern focus on role analysis. The "Observation and Memory" pattern simulats future behavior by considering the relationship between the character's history and observed options.
  • Figure 4: Analysis of simulation accuracy changes across different similarity levels between reasoning and various parts of the prompt. The results are generated using ChatGPT-3.5-turbo-0125 on the Twitter test set, with the average F1-score calculated across three behavior element tasks.
  • Figure 5: The relationship between input historical behavior size and the accuracy of simulating fine-grained behavior elements. The figure presents the results of three methods on the Twitter dataset.