Table of Contents
Fetching ...

ProsocialDialog: A Prosocial Backbone for Conversational Agents

Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap

TL;DR

ProsocialDialog tackles the unsafe-response problem in conversational AI by introducing a large-scale dataset of 58K multi-turn dialogues where agents respond to problematic content using rules-of-thumb (RoTs). It introduces Canary, a RoT-generating safety module, and Prost, a RoT-grounded dialogue agent trained on ProsocialDialog and additional corpora. Empirical results show Canary improves RoT quality and grounding, while Prost delivers more prosocial, coherent responses than strong baselines, both in-domain and out-of-domain, with RoT guidance enhancing zero-shot performance in large language models. The work emphasizes grounding dialogue safety in social norms, enabling scalable, adaptable, and ethically mindful conversational agents, while acknowledging cultural biases and ethical considerations. Overall, it provides a concrete data-driven path toward socially responsible AI and highlights areas for future governance and refinement.

Abstract

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce ProsocialDialog, the first large-scale multi-turn dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales. With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost. Empirical results show that Prost generates more socially acceptable dialogues compared to other state-of-the-art language and dialogue models in both in-domain and out-of-domain settings. Additionally, Canary effectively guides conversational agents and off-the-shelf language models to generate significantly more prosocial responses. Our work highlights the promise and importance of creating and steering conversational AI to be socially responsible.

ProsocialDialog: A Prosocial Backbone for Conversational Agents

TL;DR

ProsocialDialog tackles the unsafe-response problem in conversational AI by introducing a large-scale dataset of 58K multi-turn dialogues where agents respond to problematic content using rules-of-thumb (RoTs). It introduces Canary, a RoT-generating safety module, and Prost, a RoT-grounded dialogue agent trained on ProsocialDialog and additional corpora. Empirical results show Canary improves RoT quality and grounding, while Prost delivers more prosocial, coherent responses than strong baselines, both in-domain and out-of-domain, with RoT guidance enhancing zero-shot performance in large language models. The work emphasizes grounding dialogue safety in social norms, enabling scalable, adaptable, and ethically mindful conversational agents, while acknowledging cultural biases and ethical considerations. Overall, it provides a concrete data-driven path toward socially responsible AI and highlights areas for future governance and refinement.

Abstract

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce ProsocialDialog, the first large-scale multi-turn dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales. With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost. Empirical results show that Prost generates more socially acceptable dialogues compared to other state-of-the-art language and dialogue models in both in-domain and out-of-domain settings. Additionally, Canary effectively guides conversational agents and off-the-shelf language models to generate significantly more prosocial responses. Our work highlights the promise and importance of creating and steering conversational AI to be socially responsible.
Paper Structure (51 sections, 10 figures, 9 tables)

This paper contains 51 sections, 10 figures, 9 tables.

Figures (10)

  • Figure 1: (a) Sample responses from existing state-of-the-art conversational models brown2020gpt3roller2021blenderzhang2022opt to a problematic context. (b) An example dialogue from ProsocialDialog. At each turn, the task is to (1) first determine dialogue safety labels (§ \ref{['subsec:safety_collection']}), (2) then infer relevant rules-of-Thumb (RoTs) for problematic contexts, and (3) finally generate constructive feedback based on RoTs (§ \ref{['subsubsec:collecting_feedback']}).
  • Figure 2: The overall pipeline for collecting ProsocialDialog.
  • Figure 3: Ratio of positive, ambiguous, and negative utterances in large-scale dialogue datasets and our ProsocialDialog, measured by the pretrained BERT sentiment classifier from demszky2020goemotions.
  • Figure 4: The overall ratio and turn dynamics of dialogue safety labels in ProsocialDialog. We include the actual proportions (%) inside the bars.
  • Figure 5: Results of head-to-head comparison between models with and without Canary on ProsocialDialog via human judgements (§\ref{['subsec:zeroshot_plms']}).
  • ...and 5 more figures