GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Houde Dong; Yifei She; Kai Ye; Liangcai Su; Chenxiong Qian; Jie Hao

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Houde Dong, Yifei She, Kai Ye, Liangcai Su, Chenxiong Qian, Jie Hao

TL;DR

A critical question for evaluation is raised: Does high performance on existing static benchmarks truly guarantee robust generalization of AI judgment to real-world scenarios involving co-occurring violations and dynamically changing rules?

Abstract

Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a politician. This example illustrates two critical challenges in real-world scenarios: (1) Co-occurring Violations, where a single post violates multiple policies (e.g., prejudice and personal attacks); (2) Dynamic rules of moderation, where determination of a violation depends on platform-specific guidelines that evolve across contexts . The intersection of co-occurring harms and dynamically changing rules highlights a core limitation of current AI systems: although large language models (LLMs) are adept at following fixed guidelines, their judgment capabilities degrade when policies are unstable or context-dependent . In practice, such shortcomings lead to inconsistent moderation: either erroneously restricting legitimate expression or allowing harmful content to remain online . This raises a critical question for evaluation: Does high performance on existing static benchmarks truly guarantee robust generalization of AI judgment to real-world scenarios involving co-occurring violations and dynamically changing rules?

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

TL;DR

Abstract

Paper Structure (41 sections, 16 equations, 6 figures, 29 tables)

This paper contains 41 sections, 16 equations, 6 figures, 29 tables.

Introduction
Related Work
Current Content Moderation Benchmarks
Generative Language Model as A Judge
Description of GMP Benchmark
Overview and Statistics
Data Construction Pipeline
Benchmark Tasks and Data Composition
Assessing Moderation Generalization
Objectives and Evaluation Philosophy
Measuring Co-occurring Violation Coverage
Measuring Rule-Adaptive Judgment
Cost-Performance Analysis
Ablation Studies
Conclusion
...and 26 more sections

Figures (6)

Figure 1: Overview of GMP Benchmark tasks.
Figure 2: Data construction pipeline of GMP Benchmark. ① We collect potentially harmful content from public datasets and social media, ② adopt an LLM committee for annotation with human arbitration to resolve disagreements, ③ construct the final GMP Benchmark consisting of two subsets that evaluate model ability to identify co-occurring violations and adapt to dynamic rules, respectively.
Figure 3: Detailed performance comparison of all evaluated models on Task A (Identifying Co-occurring Violations).
Figure 4: Detailed performance metrics for Task B (Adapting to Dynamic Rules) across all evaluated models. The table presents F1-Scores and Precision for each of the four rule sets (Rule Set 1 to Rule Set 4).
Figure 5: The relationship between Task A performance (Macro F1-Score) and deployment efficiency: (a) latency trade-off, (b) cost trade-off.
...and 1 more figures

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

TL;DR

Abstract

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Authors

TL;DR

Abstract

Table of Contents

Figures (6)