Table of Contents
Fetching ...

Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model

Kiyotada Mori, Seiya Kawano, Angel Fernando Garcia Contreras, Koichiro Yoshino

TL;DR

This work tackles user-perceived latency in cascade spoken dialogue systems by enabling prefetching of responses based on a semantic-similarity Prediction Confidence Model (PCM). The PCM predicts the complete user utterance from partial input and estimates the likelihood that the predicted utterance semantically matches the true utterance using S-BERT, triggering prefetching with a threshold on similarity. Evaluations on English and Japanese task-oriented dialogue benchmarks show substantial prediction gains (often >400 ms) while maintaining natural, high-quality responses; language-specific similarity thresholds influence performance. The approach demonstrates that semantic-aware prefetching can outperform word-level matching and reduce latency without sacrificing user experience, though real-world conditions like ASR errors warrant further study.

Abstract

Prefetching of dialogue responses has been investigated to reduce user-perceived latency (UPL), which refers to the user's waiting time before receiving the system's response, in spoken dialogue systems. To reduce the UPL, it is necessary to predict complete user utterances before the end of the user's speech, typically by language models, to prepare prefetched dialogue responses. In this study, we proposed a prediction confidence model (PCM) that determines whether prefetching is possible or not by estimating the semantic similarity between the predicted complete user utterance and the complete user utterance. We evaluated our PCM based on the differences between the predicted complete user utterance and the complete user utterance.

Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model

TL;DR

This work tackles user-perceived latency in cascade spoken dialogue systems by enabling prefetching of responses based on a semantic-similarity Prediction Confidence Model (PCM). The PCM predicts the complete user utterance from partial input and estimates the likelihood that the predicted utterance semantically matches the true utterance using S-BERT, triggering prefetching with a threshold on similarity. Evaluations on English and Japanese task-oriented dialogue benchmarks show substantial prediction gains (often >400 ms) while maintaining natural, high-quality responses; language-specific similarity thresholds influence performance. The approach demonstrates that semantic-aware prefetching can outperform word-level matching and reduce latency without sacrificing user experience, though real-world conditions like ASR errors warrant further study.

Abstract

Prefetching of dialogue responses has been investigated to reduce user-perceived latency (UPL), which refers to the user's waiting time before receiving the system's response, in spoken dialogue systems. To reduce the UPL, it is necessary to predict complete user utterances before the end of the user's speech, typically by language models, to prepare prefetched dialogue responses. In this study, we proposed a prediction confidence model (PCM) that determines whether prefetching is possible or not by estimating the semantic similarity between the predicted complete user utterance and the complete user utterance. We evaluated our PCM based on the differences between the predicted complete user utterance and the complete user utterance.

Paper Structure

This paper contains 15 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Comparison of proposed PCM with conventional PCM in related work
  • Figure 2: 5-point scale evaluation on response naturalness