Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

Shu-Ting Pi; Pradeep Bagavan; Yejia Li; Disha; Qun Liu

Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

Shu-Ting Pi, Pradeep Bagavan, Yejia Li, Disha, Qun Liu

TL;DR

The paper tackles maintaining topic continuity in long LLM-driven conversations by proposing a nonlinear Naive Bayes framework augmented with an attention mechanism. It derives a tractable log-probability score ${\log P(y|S_1\cdots S_N)}$ that combines an attention term and a residual correction, using a nonlinear functional ${\mathcal F}$ and a data-driven coefficient ${\alpha}$. Core components are estimated with NSP-based models (notably Conversational BERT) and OOD-based sentence-context priors using Isolation Forest on Sentence-BERT features, enabling robust scoring for long and leap conversations. Experimental results on a synthetic Amazon dataset show improved AUC and accuracy, with demonstrated robustness to token-length and topic-shift scenarios, highlighting practical applicability for responsible and interpretable LLM use.

Abstract

Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built upon the expansion of the corresponding natural language understanding (NLU) model into quantifiable terms using a Naive Bayes approach. Subsequently, we have introduced an attention mechanism and logarithmic nonlinearity to enhance its capability to capture topic continuity. This approach allows us to convert the NLU model into an interpretable analytical formula. In contrast to many NLU models constrained by token limits, our proposed model can seamlessly handle conversations of any length with linear time complexity. Furthermore, the attention mechanism significantly improves the model's ability to identify topic continuity in complex conversations. According to our experiments, our model consistently outperforms traditional methods, particularly in handling lengthy and intricate conversations. This unique capability offers us an opportunity to ensure the responsible and interpretable use of LLMs.

Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

TL;DR

that combines an attention term and a residual correction, using a nonlinear functional

and a data-driven coefficient

. Core components are estimated with NSP-based models (notably Conversational BERT) and OOD-based sentence-context priors using Isolation Forest on Sentence-BERT features, enabling robust scoring for long and leap conversations. Experimental results on a synthetic Amazon dataset show improved AUC and accuracy, with demonstrated robustness to token-length and topic-shift scenarios, highlighting practical applicability for responsible and interpretable LLM use.

Abstract

Paper Structure (15 sections, 14 equations, 2 figures, 1 table)

This paper contains 15 sections, 14 equations, 2 figures, 1 table.

Introduction
Nolinear Naive Bayes With Attention Mechanism
Model Definition
Naive Bayes With Attention
Logarithmic Non-linearity
Formulation of Nonlinear Transformation
Designing Attention Functional
Designing Residual Coefficient
Estimation of Fundamental Components
Experiments
Dataset
Benchmark Test
Exploration of the Residual Term
Exploration of the Attention Mechanism
Conclusion

Figures (2)

Figure 1: Computation graph for calculating the NLU likelihood (highlighted in orange). The blue blocks represent fundamental components of our model.
Figure 2: Impact of attention and residual terms. (a)-(b): Normalized Distribution of $P_{nlu}$ without residual term (a) and with residual term (b) for selected uncertain examples. Red lines indicate approximate Gaussian kernel density fitting. (c)-(d): Average probability output per segmentation, categorized by token length, is shown in (c) for NSP and (d) for our model. The dashed lines denote 300 tokens. Data beyond 512 tokens were truncated in (c) due to NSP's processing limit.

Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

TL;DR

Abstract

Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (2)