Table of Contents
Fetching ...

Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li

TL;DR

The paper tackles multi-turn response selection in retrieval-based chatbots by introducing Sequential Matching Network (SMN), a matching-first architecture that couples utterance-response matching with sequential accumulation. SMN computes word- and segment-level matching for each utterance against a candidate response, distills this into matching vectors via CNNs, and then uses a GRU to accumulate these vectors in utterance order to model context relationships. It also provides three prediction variants (SMN_last, SMN_static, SMN_dynamic) and demonstrates substantial improvements over strong baselines on the Ubuntu and Douban datasets, with significant gains in $R_{10}@1$, MAP, and other metrics. Additionally, the authors release the Douban Conversation Corpus, a large human-labeled dataset for open-domain multi-turn response selection, validating the approach's effectiveness across domains and highlighting the importance of preserving utterance-level information and inter-utterance dependencies. Overall, SMN advances context-aware response selection by integrating multi-granularity matching with temporal accumulation, offering strong practical potential for retrieval-based chatbots.

Abstract

We study response selection for multi-turn conversation in retrieval-based chatbots. Existing work either concatenates utterances in context or matches a response with a highly abstract context vector finally, which may lose relationships among utterances or important contextual information. We propose a sequential matching network (SMN) to address both problems. SMN first matches a response with each utterance in the context on multiple levels of granularity, and distills important matching information from each pair as a vector with convolution and pooling operations. The vectors are then accumulated in a chronological order through a recurrent neural network (RNN) which models relationships among utterances. The final matching score is calculated with the hidden states of the RNN. An empirical study on two public data sets shows that SMN can significantly outperform state-of-the-art methods for response selection in multi-turn conversation.

Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

TL;DR

The paper tackles multi-turn response selection in retrieval-based chatbots by introducing Sequential Matching Network (SMN), a matching-first architecture that couples utterance-response matching with sequential accumulation. SMN computes word- and segment-level matching for each utterance against a candidate response, distills this into matching vectors via CNNs, and then uses a GRU to accumulate these vectors in utterance order to model context relationships. It also provides three prediction variants (SMN_last, SMN_static, SMN_dynamic) and demonstrates substantial improvements over strong baselines on the Ubuntu and Douban datasets, with significant gains in , MAP, and other metrics. Additionally, the authors release the Douban Conversation Corpus, a large human-labeled dataset for open-domain multi-turn response selection, validating the approach's effectiveness across domains and highlighting the importance of preserving utterance-level information and inter-utterance dependencies. Overall, SMN advances context-aware response selection by integrating multi-granularity matching with temporal accumulation, offering strong practical potential for retrieval-based chatbots.

Abstract

We study response selection for multi-turn conversation in retrieval-based chatbots. Existing work either concatenates utterances in context or matches a response with a highly abstract context vector finally, which may lose relationships among utterances or important contextual information. We propose a sequential matching network (SMN) to address both problems. SMN first matches a response with each utterance in the context on multiple levels of granularity, and distills important matching information from each pair as a vector with convolution and pooling operations. The vectors are then accumulated in a chronological order through a recurrent neural network (RNN) which models relationships among utterances. The final matching score is calculated with the hidden states of the RNN. An empirical study on two public data sets shows that SMN can significantly outperform state-of-the-art methods for response selection in multi-turn conversation.

Paper Structure

This paper contains 18 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Architecture of SMN
  • Figure 2: Model visualization. Darker areas mean larger value.
  • Figure 3: Comparison across context length
  • Figure 4: Performance of SMN across maximum context length