Table of Contents
Fetching ...

Open-Domain Safety Policy Construction

Di Wu, Siyue Liu, Zixiang Ji, Ya-Liang Chang, Zhe-Yu Liu, Andrew Pleffer, Kai-Wei Chang

Abstract

Moderation layers are increasingly a core component of many products built on user- or model-generated content. However, drafting and maintaining domain-specific safety policies remains costly. We present Deep Policy Research (DPR), a minimal agentic system that drafts a full content moderation policy based on only human-written seed domain information. DPR uses a single web search tool and lightweight scaffolding to iteratively propose search queries, distill diverse web sources into policy rules, and organize rules into an indexed document. We evaluate DPR on (1) the OpenAI undesired content benchmark across five domains with two compact reader LLMs and (2) an in-house multimodal advertisement moderation benchmark. DPR consistently outperforms definition-only and in-context learning baselines, and in our end-to-end setting it is competitive with expert-written policy sections in several domains. Moreover, under the same seed specification and evaluation protocol, DPR outperforms a general-purpose deep research system, suggesting that a task-specific, structured research loop can be more effective than generic web research for policy drafting. We release our experiment code at https://github.com/xiaowu0162/deep-policy-research.

Open-Domain Safety Policy Construction

Abstract

Moderation layers are increasingly a core component of many products built on user- or model-generated content. However, drafting and maintaining domain-specific safety policies remains costly. We present Deep Policy Research (DPR), a minimal agentic system that drafts a full content moderation policy based on only human-written seed domain information. DPR uses a single web search tool and lightweight scaffolding to iteratively propose search queries, distill diverse web sources into policy rules, and organize rules into an indexed document. We evaluate DPR on (1) the OpenAI undesired content benchmark across five domains with two compact reader LLMs and (2) an in-house multimodal advertisement moderation benchmark. DPR consistently outperforms definition-only and in-context learning baselines, and in our end-to-end setting it is competitive with expert-written policy sections in several domains. Moreover, under the same seed specification and evaluation protocol, DPR outperforms a general-purpose deep research system, suggesting that a task-specific, structured research loop can be more effective than generic web research for policy drafting. We release our experiment code at https://github.com/xiaowu0162/deep-policy-research.

Paper Structure

This paper contains 38 sections, 16 figures, 8 tables.

Figures (16)

  • Figure 1: An illustration of Deep Policy Research. Based on a domain specification, an LLM iteratively interacts with a search engine, extracts policy rules, and indexes the rules through keyphrase-based clustering.
  • Figure 2: Detailed specifications of the domains experimented in this paper. These prompts were created solely for the purposes of this article and are provided for illustrative use only. They do not reflect official Taboola policy, which may be updated or revised over time.
  • Figure 3: Prompt for generating web search queries. The research agent uses it to identify missing coverage.
  • Figure 4: Prompt for extracting rules from a webpage chunk. The research agent uses it to generate new candidate rules.
  • Figure 5: Prompt for scoring rule relevance. The research agent uses it to filter candidate rules.
  • ...and 11 more figures