Table of Contents
Fetching ...

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention

Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, William Yang Wang

TL;DR

This work tackles the scalability of semantically conditioned neural response generation in multi-domain dialogue by representing dialog acts as a hierarchical graph. It introduces a hierarchical disentangled self-attention (HDSA) mechanism that binds attention heads to nodes on the act graph and activates them along the predicted act path to control generation. The graph-based act representation reduces sample complexity and improves generalization, achieving significant gains on MultiWOZ in both automatic metrics and human evaluations. The paper also discusses transfer-learning potential and compression-versus-expressiveness trade-offs, outlining future work to infer dialog acts from responses in partially supervised settings.

Abstract

Semantically controlled neural response generation on limited-domain has achieved great performance. However, moving towards multi-domain large-scale scenarios are shown to be difficult because the possible combinations of semantic inputs grow exponentially with the number of domains. To alleviate such scalability issue, we exploit the structure of dialog acts to build a multi-layer hierarchical graph, where each act is represented as a root-to-leaf route on the graph. Then, we incorporate such graph structure prior as an inductive bias to build a hierarchical disentangled self-attention network, where we disentangle attention heads to model designated nodes on the dialog act graph. By activating different (disentangled) heads at each layer, combinatorially many dialog act semantics can be modeled to control the neural response generation. On the large-scale Multi-Domain-WOZ dataset, our model can yield a significant improvement over the baselines on various automatic and human evaluation metrics.

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention

TL;DR

This work tackles the scalability of semantically conditioned neural response generation in multi-domain dialogue by representing dialog acts as a hierarchical graph. It introduces a hierarchical disentangled self-attention (HDSA) mechanism that binds attention heads to nodes on the act graph and activates them along the predicted act path to control generation. The graph-based act representation reduces sample complexity and improves generalization, achieving significant gains on MultiWOZ in both automatic metrics and human evaluations. The paper also discusses transfer-learning potential and compression-versus-expressiveness trade-offs, outlining future work to infer dialog acts from responses in partially supervised settings.

Abstract

Semantically controlled neural response generation on limited-domain has achieved great performance. However, moving towards multi-domain large-scale scenarios are shown to be difficult because the possible combinations of semantic inputs grow exponentially with the number of domains. To alleviate such scalability issue, we exploit the structure of dialog acts to build a multi-layer hierarchical graph, where each act is represented as a root-to-leaf route on the graph. Then, we incorporate such graph structure prior as an inductive bias to build a hierarchical disentangled self-attention network, where we disentangle attention heads to model designated nodes on the dialog act graph. By activating different (disentangled) heads at each layer, combinatorially many dialog act semantics can be modeled to control the neural response generation. On the large-scale Multi-Domain-WOZ dataset, our model can yield a significant improvement over the baselines on various automatic and human evaluation metrics.

Paper Structure

This paper contains 29 sections, 7 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: An example dialog from MultiWOZ dataset, where the upper rectangle includes the dialog history, the tables at the bottom represent the external database, and the lower rectangle contains the dialog action and the language surface form that we need to predict.
  • Figure 2: The left part is the graph representation of the dialog acts, where each path in the graph denotes a unique dialog act. The right part denotes our proposed HDSA, where the orange nodes are activated while the others are blocked. (For details, refer to \ref{['fig:architecture']})
  • Figure 3: Illustration of the neural dialog system. We decompose it into two parts: the lower part describes the dialog state tracking and DB query, and the upper part denotes the Dialog Action Prediction and Response Generation. In this paper, we are mainly interested in improving the performance of the upper part.
  • Figure 4: The left figure describes the tree representation of the dialog acts, and the right figure denotes the obtained graph representation from the left after merging the cross-branch nodes that have the same semantics. The Hierarchical form is used in our main model HDSA, Falttented is used for baseline models.
  • Figure 5: The left figure describes the dialog act predictor and HDSA, and the right figure describes the details of DSA. The predicted hierarchical dialog acts are used to control the switch in HDSA at each layer. Here we use $L=3$ layers, the head numbers at each layer are $H=(4, 3, 6)$ heads, the hierarchical graph representation $A$=$[[0, 1, 0, 0], [0, 1, 0], [0, 0, 1, 1, 0, 0]]$. We use $m$ to denote the dialog history length and $n$ for response.
  • ...and 7 more figures