Table of Contents
Fetching ...

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

Daejin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim

TL;DR

This work develops a self-improving method to improve the generative performances of intermediate steps without the ground truth data and proposes a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses.

Abstract

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

TL;DR

This work develops a self-improving method to improve the generative performances of intermediate steps without the ground truth data and proposes a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses.

Abstract

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.
Paper Structure (28 sections, 4 equations, 4 figures, 15 tables)

This paper contains 28 sections, 4 equations, 4 figures, 15 tables.

Figures (4)

  • Figure 1: Example of external knowledge-grounded inference of our model. Here, we show an illustrative example of how the model inferences intermediate steps for external knowledge-grounded dialogue response generation. Following the same scheme as BB3 Shuster2022BlenderBot3A, given an input context, with a special token __is-search-required__, the model decides whether to search or not by outputting __do-search__ or __do-not-search__. Upon deciding to search, the model then generates a search query that will be used in the external knowledge source such as web, to retrieve relevant documents. For the query generation, a special token of __generate-query__ is appended at the end of the original context. With the retrieved documents, the model then generates a knowledge piece for the context using a special token __generate-knowledge__. Finally, with the generated knowledge appended to the context, the model generates the response for the given context.
  • Figure 2: Schematic diagram of Hexa at iteration $t$. (Left) The overall flow of data bootstrapping and finetuning in Hexa. Given a dialogue context and response pair $(x_i,y_i)$ sampled from the dataset $\mathcal{D}$, Hexa runs through bootstrapping phases represented in the gray shaded area. The model is then finetuned on the bootstrapped data and the process repeats. (Right) More detailed sketch of Hexa. With input $x_i$ the model generates intermediate steps, $z_1, z_2$, and $z_3$, and a response $y$. (Right-Top) Due to the mis-informed intermediate step, $y$ (red) is rejected by the matching function and is added to response set $h^t_i$. (Right-Bottom) The model generates a response again with a guided prompt, highlighted in green below $x_i$. This time, $z_3$ is well aligned with $x_i$, leading a correct response. Then the sample $(x_i, \mathbf{z}$, and $y)$ (blue) is stored in bootstrapped data on which the model is finetuned.
  • Figure 3: Graphical model of latent variables. Given the dialogue context $x$, $z_1 \sim p_{\theta}(\cdot \vert x)$ and $z_2 \sim p_{\theta}(\cdot \vert x, z_1)$ are the search query and the search knowledge respectively, where the search query is used as a query to retrieve external knowledge from sources such as web and the search knowledge is generated based on the retrieved external knowledge and $x$. $z_3 \sim p_{\theta}(\cdot \vert x)$ is the entity knowledge, generated using only the dialogue context $x$. Finally, $z_4 \sim p_{\theta}(\cdot \vert x)$ is the retrieved dialogue history-based internal knowledge, conditioned on $x$. After generating these intermediate steps, the final response $y \sim p_{\theta}(\cdot \vert x, z_{2:4})$ is conditionally generated.
  • Figure 4: Conceptual illustration of curriculum learning in Hexa. Here, a question of What animal says 'coin coin', according to the French? with the ground truth Duck is given. The model at $t$ produces a wrong response Lion but attempts again with a guided prompt $h_i^t=\{\text{Duck}, \text{Lion}\}$, and the response is correct. After $n$ iterations, the model is asked with the same question again with a expanded set $h_i^{t+n}=\{\text{Swan}, \text{Duck}, \text{Lion}, ... \}$ and outputs Mallard Drake. Since Mallard Drake is a species of Duck, it can also be one of the ground truth output, and Hexa includes it in the training set.