Table of Contents
Fetching ...

FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

Yilun Zheng, Sha Li, Fangkun Wu, Yang Ziyi, Lin Hongchao, Zhichao Hu, Cai Xinjun, Ziming Wang, Jinxuan Chen, Sitao Luan, Jiahao Xu, Lihui Chen

TL;DR

FanChuan introduces a multilingual, graph-structured parody benchmark comprising seven datasets across English and Chinese, with 21,210 annotated comments and 14,755 annotated users. Datasets are represented as heterogeneous information networks to encode rich contextual interactions via user and post nodes and two edge types, enabling three tasks: $P_1$ Parody Detection, $P_2$ Comment Sentiment Classification, and $P_3$ User Sentiment Classification. The paper comprehensively evaluates embedding-based methods, inconsistency-based approaches, outlier detectors, graph neural networks, and Large Language Models on these tasks, finding that parody-related problems remain challenging and that contextual information consistently boosts performance, while reasoning LLMs does not reliably improve results. These findings illuminate the limitations of current LLMs in parody detection and emphasize the importance of context and graph-structured representations for cross-lingual parody analysis, with implications for future model development and dataset expansion.

Abstract

Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.

FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

TL;DR

FanChuan introduces a multilingual, graph-structured parody benchmark comprising seven datasets across English and Chinese, with 21,210 annotated comments and 14,755 annotated users. Datasets are represented as heterogeneous information networks to encode rich contextual interactions via user and post nodes and two edge types, enabling three tasks: Parody Detection, Comment Sentiment Classification, and User Sentiment Classification. The paper comprehensively evaluates embedding-based methods, inconsistency-based approaches, outlier detectors, graph neural networks, and Large Language Models on these tasks, finding that parody-related problems remain challenging and that contextual information consistently boosts performance, while reasoning LLMs does not reliably improve results. These findings illuminate the limitations of current LLMs in parody detection and emphasize the importance of context and graph-structured representations for cross-lingual parody analysis, with implications for future model development and dataset expansion.

Abstract

Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.

Paper Structure

This paper contains 38 sections, 10 figures, 10 tables.

Figures (10)

  • Figure 1: People debate online about the topic, "Should my boyfriend hand over his salary to me?" Some users explicitly support or oppose this viewpoint, while others implicitly express their stance through parody, using humor or even subtle blackmail to make their point.
  • Figure 2: The pipeline for the construction of FanChuan, which includes three key steps: data collection (left), annotation (middle), and preprocessing (right).
  • Figure 3: Examples of a parody dataset as a heterogeneous graph.
  • Figure 4: Performance comparison between reasoning LLMs and non-reasoning LLMs using average F1 Score (%) over six datasets.
  • Figure 5: Impact of contextual information on parody detection across seven datasets.
  • ...and 5 more figures