Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Pei Zhou, Karthik Gopalakrishnan, Behnam Hedayatnia, Seokhwan Kim, Jay Pujara, Xiang Ren, Yang Liu, Dilek Hakkani-Tur
TL;DR
The paper introduces Think-Before-Speaking (TBS), a framework that explicitly generates implicit commonsense knowledge prior to response generation in open-domain dialogue. By coupling a knowledge-generation step with the response generator, TBS achieves more informative and contextually grounded responses and offers a faithful explanation of its intent. It builds weakly supervised, knowledge-aligned dialogues from ConceptNet, explores two NL knowledge representations, and demonstrates strong gains over end-to-end RG and several knowledge-augmented baselines, including human judgments and knowledge-grounding evidence. The results indicate that externalizing implicit knowledge can improve learning efficiency, generation quality, and interpretability, while also enabling the model to produce novel, relevant knowledge. This work suggests a promising direction for more human-like grounding in conversational AI and highlights the importance of knowledge quality and structured representations in grounding decisions.
Abstract
Implicit knowledge, such as common sense, is key to fluid human conversations. Current neural response generation (RG) models are trained to generate responses directly, omitting unstated implicit knowledge. In this paper, we present Think-Before-Speaking (TBS), a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak). We expect that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models. We analyze different choices to collect knowledge-aligned dialogues, represent implicit knowledge, and transition between knowledge and dialogues. Empirical results show TBS models outperform end-to-end and knowledge-augmented RG baselines on most automatic metrics and generate more informative, specific, and commonsense-following responses, as evaluated by human annotators. TBS also generates knowledge that makes sense and is relevant to the dialogue around 85\% of the time.
