Table of Contents
Fetching ...

AMUSED: A Multi-Stream Vector Representation Method for Use in Natural Dialogue

Gaurav Kumar, Rishabh Joshi, Jaspreet Singh, Promod Yenigalla

TL;DR

AMUSED tackles the challenge of coherent chit-chat by learning unified query–response embeddings through a multi-stream pipeline that fuses semantic, syntactic, contextual, and external knowledge. The model combines Bi-GRU and a syntactic Graph Convolution Network with a Transformer-based next-dialogue predictor, KB-neighbor embeddings, and a memory network, trained via a triplet loss to pull correct responses closer than negatives. Empirical results on Persona-Chat (and supplementary datasets) demonstrate improved next-dialogue prediction accuracy and retrieval quality, corroborated by automated metrics and expert human evaluation. This retrieval-based approach enhances discourse continuity and non-monotonic engagement, offering a scalable strategy adaptable to other conversational tasks.

Abstract

The problem of building a coherent and non-monotonous conversational agent with proper discourse and coverage is still an area of open research. Current architectures only take care of semantic and contextual information for a given query and fail to completely account for syntactic and external knowledge which are crucial for generating responses in a chit-chat system. To overcome this problem, we propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the the neighborhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques. Furthermore, we use AMUSED to represent query and responses along with its context to develop a retrieval based conversational agent which has been validated by expert linguists to have comprehensive engagement with humans.

AMUSED: A Multi-Stream Vector Representation Method for Use in Natural Dialogue

TL;DR

AMUSED tackles the challenge of coherent chit-chat by learning unified query–response embeddings through a multi-stream pipeline that fuses semantic, syntactic, contextual, and external knowledge. The model combines Bi-GRU and a syntactic Graph Convolution Network with a Transformer-based next-dialogue predictor, KB-neighbor embeddings, and a memory network, trained via a triplet loss to pull correct responses closer than negatives. Empirical results on Persona-Chat (and supplementary datasets) demonstrate improved next-dialogue prediction accuracy and retrieval quality, corroborated by automated metrics and expert human evaluation. This retrieval-based approach enhances discourse continuity and non-monotonic engagement, offering a scalable strategy adaptable to other conversational tasks.

Abstract

The problem of building a coherent and non-monotonous conversational agent with proper discourse and coverage is still an area of open research. Current architectures only take care of semantic and contextual information for a given query and fail to completely account for syntactic and external knowledge which are crucial for generating responses in a chit-chat system. To overcome this problem, we propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the the neighborhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques. Furthermore, we use AMUSED to represent query and responses along with its context to develop a retrieval based conversational agent which has been validated by expert linguists to have comprehensive engagement with humans.

Paper Structure

This paper contains 19 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of AMUSED. AMUSED first encodes each sentence by concatenating embeddings (denoted by $\oplus$) from Bi-LSTM and Syntactic GCN for each token, followed by word attention. The sentence embedding is then concatenated with the knowledge embedding from the Knowledge Module (Figure \ref{['fig:knowledge_module']}). The query embedding passes through the Memory Module (Figure \ref{['fig:memory_module']}) before being trained using triplet loss. Please see Section \ref{['sec:details']} for more details.
  • Figure 2: Description of Knowledge Module. The input sentence is passed to a pre-trained BERT model, output from which is concatenated with averaged embedding from the KB-neighbors of entities present in the input. Refer Section \ref{['sec:knowledge_module_details']} for a detailed explanation.
  • Figure 3: Memory Module description. The query representation and BERT embeddings of the context sentences is passed to the memory network to capture the dialogue context. Please see Section \ref{['sec:memory_module_details']} for more details.