Table of Contents
Fetching ...

RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani Azad, Ingrid Zukerman, Gholamreza Haffari

TL;DR

ReNoVi introduces a Chinese socio-cultural dialogue benchmark for remediating norm violations, comprising 9,258 multi-turn dialogues (512 human-authored, 8,746 synthetic) and four tasks grounded in EVT and IAT: norm violation detection, impact estimation, remediation generation, and justification generation. The dataset combines human and ChatGPT-generated data to address data scarcity and to probe alignment between LLMs and humans in social-norm awareness, with a rigorous quality-control protocol. Experimental results show synthetic data alone does not improve performance, but when merged with a modest amount of human-authored data, it enhances violation detection; generation tasks reveal best-performing models differ for remediation versus justification, underscoring the value of diverse adapters and prompting strategies. Limitations include a monolingual focus on Chinese norms, lack of tailored baseline models, and potential misuse concerns, addressed via ethical guidelines and careful annotation practices. Overall, ReNoVi enables systematic evaluation and development of norm-aware dialogue systems with practical implications for safer, culturally sensitive AI interactions.

Abstract

Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.

RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

TL;DR

ReNoVi introduces a Chinese socio-cultural dialogue benchmark for remediating norm violations, comprising 9,258 multi-turn dialogues (512 human-authored, 8,746 synthetic) and four tasks grounded in EVT and IAT: norm violation detection, impact estimation, remediation generation, and justification generation. The dataset combines human and ChatGPT-generated data to address data scarcity and to probe alignment between LLMs and humans in social-norm awareness, with a rigorous quality-control protocol. Experimental results show synthetic data alone does not improve performance, but when merged with a modest amount of human-authored data, it enhances violation detection; generation tasks reveal best-performing models differ for remediation versus justification, underscoring the value of diverse adapters and prompting strategies. Limitations include a monolingual focus on Chinese norms, lack of tailored baseline models, and potential misuse concerns, addressed via ethical guidelines and careful annotation practices. Overall, ReNoVi enables systematic evaluation and development of norm-aware dialogue systems with practical implications for safer, culturally sensitive AI interactions.

Abstract

Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.
Paper Structure (34 sections, 7 figures, 8 tables)

This paper contains 34 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Main tasks of our framework (left): (1) norm violation detection, (2) violation impact estimation, (3) remediation generation, and (4) justification generation. Each dialogue (right) contains a corresponding dialogue scenario related to social norms. The detailed norm categories and rules are presented in Appendix \ref{['sec:def-social-norm']}.
  • Figure 2: Norm category distributions of synthetic (left) and human-authored (right) data in ReNoVi dataset.
  • Figure 3: Distribution divergences between the embeddings (t-SNE) of synthetic (green) and human-written (red) in terms of (a) dialogue session, (b) remediation sentence, and (c) justification sentence.
  • Figure 4: Case study on the generated remediation and justification from different LLMs. We refer the corresponding English translation version to appendix \ref{['sec:case_study_en']}.
  • Figure 5: The English verson of case study on the generated remediation and justification from different LLMs. Please note that the translation was conducted by ChatGPT.
  • ...and 2 more figures