Table of Contents
Fetching ...

Neural Belief Tracker: Data-Driven Dialogue State Tracking

Nikola Mrkšić, Diarmuid Ó Séaghdha, Tsung-Hsien Wen, Blaise Thomson, Steve Young

TL;DR

This work tackles the scalability bottleneck in dialogue state tracking by introducing Neural Belief Tracker (NBT), a data-driven approach that reasons over pre-trained word vectors to jointly model user utterances and dialogue context without hand-crafted semantic lexicons. It presents two representation-learning variants (NBT-DNN and NBT-CNN) and a semantic decoding mechanism that directly evaluates candidate slot-value expressions within a given dialogue context. Through experiments on DSTC2 and WOZ 2.0, NBT matches lexicon-based methods and outperforms them when lexical resources are unavailable, with performance further enhanced by semantically specialized word vectors (Paragram-SL999). The results demonstrate NBT’s potential for scalable, domain-rich dialogue systems and highlight the importance of vector-space semantics in belief tracking across noisy and varied language.

Abstract

One of the core components of modern spoken dialogue systems is the belief tracker, which estimates the user's goal at every step of the dialogue. However, most current approaches have difficulty scaling to larger, more complex dialogue domains. This is due to their dependency on either: a) Spoken Language Understanding models that require large amounts of annotated training data; or b) hand-crafted lexicons for capturing some of the linguistic variation in users' language. We propose a novel Neural Belief Tracking (NBT) framework which overcomes these problems by building on recent advances in representation learning. NBT models reason over pre-trained word vectors, learning to compose them into distributed representations of user utterances and dialogue context. Our evaluation on two datasets shows that this approach surpasses past limitations, matching the performance of state-of-the-art models which rely on hand-crafted semantic lexicons and outperforming them when such lexicons are not provided.

Neural Belief Tracker: Data-Driven Dialogue State Tracking

TL;DR

This work tackles the scalability bottleneck in dialogue state tracking by introducing Neural Belief Tracker (NBT), a data-driven approach that reasons over pre-trained word vectors to jointly model user utterances and dialogue context without hand-crafted semantic lexicons. It presents two representation-learning variants (NBT-DNN and NBT-CNN) and a semantic decoding mechanism that directly evaluates candidate slot-value expressions within a given dialogue context. Through experiments on DSTC2 and WOZ 2.0, NBT matches lexicon-based methods and outperforms them when lexical resources are unavailable, with performance further enhanced by semantically specialized word vectors (Paragram-SL999). The results demonstrate NBT’s potential for scalable, domain-rich dialogue systems and highlight the importance of vector-space semantics in belief tracking across noisy and varied language.

Abstract

One of the core components of modern spoken dialogue systems is the belief tracker, which estimates the user's goal at every step of the dialogue. However, most current approaches have difficulty scaling to larger, more complex dialogue domains. This is due to their dependency on either: a) Spoken Language Understanding models that require large amounts of annotated training data; or b) hand-crafted lexicons for capturing some of the linguistic variation in users' language. We propose a novel Neural Belief Tracking (NBT) framework which overcomes these problems by building on recent advances in representation learning. NBT models reason over pre-trained word vectors, learning to compose them into distributed representations of user utterances and dialogue context. Our evaluation on two datasets shows that this approach surpasses past limitations, matching the performance of state-of-the-art models which rely on hand-crafted semantic lexicons and outperforming them when such lexicons are not provided.

Paper Structure

This paper contains 20 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Annotated dialogue states in a sample dialogue. Underlined words show rephrasings which are typically handled using semantic dictionaries.
  • Figure 2: An example semantic dictionary with rephrasings for three ontology values in a restaurant search domain.
  • Figure 3: Architecture of the NBT Model. The implementation of the three representation learning subcomponents can be modified, as long as these produce adequate vector representations which the downstream model components can use to decide whether the current candidate slot-value pair was expressed in the user utterance (taking into account the preceding system act).
  • Figure 4: NBT-DNN Model. Word vectors of $n$-grams ($n=1,2,3$) are summed to obtain cumulative$n$-grams, then passed through another hidden layer and summed to obtain the utterance representation $\mathbf{r}$.
  • Figure 5: NBT-CNN Model. $L$ convolutional filters of window sizes $1,2,3$ are applied to word vectors of the given utterance ($L=3$ in the diagram, but $L=300$ in the system). The convolutions are followed by the ReLU activation function and max-pooling to produce summary $n$-gram representations. These are summed to obtain the utterance representation $\mathbf{r}$.