GrounDial: Human-norm Grounded Safe Dialog Response Generation
Siwon Kim, Shuyang Dai, Mohammad Kachuee, Shayan Ray, Tara Taghavi, Sungroh Yoon
TL;DR
GrounDial tackles unsafe dialogue outputs by grounding responses to human-norm Rules-of-Thumb (RoT) without additional fine-tuning. It combines explicit RoT grounding via in-context learning (ICL) with implicit, decoding-time grounding through a KID-inspired human-norm-guided decoding (HGD) strategy, using RoT retrieval from a curated set R. Empirical results on BlenderBot with ProsocialDialog show that the approach yields higher safety and RoT agreement than vanilla baselines and is competitive with fine-tuned models, with the strongest gains arising from the joint ICL+HGD setup. This method offers a cost-efficient, generalizable path to safer dialog systems by leveraging human norms and RL-inspired decoding rather than retraining large models.
Abstract
Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.
