Table of Contents
Fetching ...

PolicyPad: Collaborative Prototyping of LLM Policies

K. J. Kevin Feng, Tzu-Sheng Kuo, Quan Ze, Chen, Inyoung Cheong, Kenneth Holstein, Amy X. Zhang

TL;DR

PolicyPad is presented, an interactive system that facilitates the emerging practice of LLM policy prototyping by drawing from established UX prototyping practices, including heuristic evaluation and storyboarding, finding that PolicyPad enhanced collaborative dynamics during policy design, enabled tight feedback loops, and led to novel policy contributions.

Abstract

As LLMs gain adoption in high-stakes domains like mental health, domain experts are increasingly consulted to provide input into policies governing their behavior. From an observation of 19 policymaking workshops with 9 experts over 15 weeks, we identified opportunities to better support rapid experimentation, feedback, and iteration for collaborative policy design processes. We present PolicyPad, an interactive system that facilitates the emerging practice of LLM policy prototyping by drawing from established UX prototyping practices, including heuristic evaluation and storyboarding. Using PolicyPad, policy designers can collaborate on drafting a policy in real time while independently testing policy-informed model behavior with usage scenarios. We evaluate PolicyPad through workshops with 8 groups of 22 domain experts in mental health and law, finding that PolicyPad enhanced collaborative dynamics during policy design, enabled tight feedback loops, and led to novel policy contributions. Overall, our work paves expert-informed paths for advancing AI alignment and safety.

PolicyPad: Collaborative Prototyping of LLM Policies

TL;DR

PolicyPad is presented, an interactive system that facilitates the emerging practice of LLM policy prototyping by drawing from established UX prototyping practices, including heuristic evaluation and storyboarding, finding that PolicyPad enhanced collaborative dynamics during policy design, enabled tight feedback loops, and led to novel policy contributions.

Abstract

As LLMs gain adoption in high-stakes domains like mental health, domain experts are increasingly consulted to provide input into policies governing their behavior. From an observation of 19 policymaking workshops with 9 experts over 15 weeks, we identified opportunities to better support rapid experimentation, feedback, and iteration for collaborative policy design processes. We present PolicyPad, an interactive system that facilitates the emerging practice of LLM policy prototyping by drawing from established UX prototyping practices, including heuristic evaluation and storyboarding. Using PolicyPad, policy designers can collaborate on drafting a policy in real time while independently testing policy-informed model behavior with usage scenarios. We evaluate PolicyPad through workshops with 8 groups of 22 domain experts in mental health and law, finding that PolicyPad enhanced collaborative dynamics during policy design, enabled tight feedback loops, and led to novel policy contributions. Overall, our work paves expert-informed paths for advancing AI alignment and safety.

Paper Structure

This paper contains 65 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Research Process Overview. Our work proceeded in 4 phases: (1) a 15-week observational study with 9 mental health experts (19 workshops) led to (2) conceptualization of LLM policy prototyping. We then (3) designed and built PolicyPad and (4) evaluated it through 8 policy prototyping sessions with 22 experts (10 mental health, 12 legal).
  • Figure 2: Timeline of activities in our 15-week observational study. During taxonomy development, experts organized and taxonomized a collection of diverse LLM usage scenarios. During rule writing, experts drafted, discussed, and refined rules to govern LLM behavior. During co-design, experts interacted with and gave feedback on prototypes we built of a tool for collaborative policy design. There was fluid movement between taxonomy development and rule-writing, such that some sessions included activities pertaining to both goals.
  • Figure 3: Illustration of our envisioned LLM policy prototyping process. Scenarios inform desiderata for the policy via heuristics, which in turn guide the design of the policy. The policy shapes the behavior of a policy-informed LLM, which designers can then test against the scenarios to observe changes in behavior. The process is iterative: feedback from testing may lead to the creation of new scenarios, heuristics, and policy statements.
  • Figure 4: Main components of the PolicyPad system. Users can keep track of their policy version in the left sidebar (A) as they collaborative edit the policy in the editor (B). Users can access scenarios via the scenario gallery (C). When they click into a scenario, they can view its full details and explore how the policy-informed model will behave on it using the scenario sidebar (D).
  • Figure 5: Scenarios can be brought into the editor inline with the policy as interactive widgets via referencing the scenario's title with the '@' symbol. Once in the editor, all users can click on it, view it in their scenario sidebar, and flag responses for group discussion.
  • ...and 8 more figures