Table of Contents
Fetching ...

EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance

Heejae Suh, Yejin Jeon, Deokhyung Kang, Taehee Park, Yejin Min, Gary Geunbae Lee

TL;DR

The paper addresses the challenge of maintaining topic adherence in task-oriented dialogue with resource-efficient small LLMs, where off-topic or unsafe inputs can degrade user experience. It introduces EnSToM, a lightweight activation steering method that extracts a steering vector from activations and applies an entropy-based scaling at inference to adapt steering strength to input uncertainty. Experimental results on CantTalkAboutThis show that EnSToM significantly improves distractor refusal while preserving on-topic engagement, with best overall accuracy around 0.8 and demonstrated generalization across models and domains. Layer-wise entropy analysis further provides insights into how intermediate layers process distractor versus on-topic inputs, supporting robust, low-data deployment of topic-maintenance in real-world dialogue systems.

Abstract

Small large language models (sLLMs) offer the advantage of being lightweight and efficient, which makes them suitable for resource-constrained environments. However, sLLMs often struggle to maintain topic consistency in task-oriented dialogue systems, which is critical for scenarios such as service chatbots. Specifically, it is important to ensure that the model denies off-topic or malicious inputs and adheres to its intended functionality so as to prevent potential misuse and uphold reliability. Towards this, existing activation engineering approaches have been proposed to manipulate internal activations during inference. While these methods are effective in certain scenarios, our preliminary experiments reveal their limitations in ensuring topic adherence. Therefore, to address this, we propose a novel approach termed Entropy-scaled Steering vectors for Topic Maintenance (EnSToM). EnSToM dynamically adjusts the steering intensity based on input uncertainty, which allows the model to handle off-topic distractors effectively while preserving on-topic accuracy. Our experiments demonstrate that EnSToM achieves significant performance gain with a relatively small data size compared to fine-tuning approaches. By improving topic adherence without compromising efficiency, our approach provides a robust solution for enhancing sLLM-based dialogue systems.

EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance

TL;DR

The paper addresses the challenge of maintaining topic adherence in task-oriented dialogue with resource-efficient small LLMs, where off-topic or unsafe inputs can degrade user experience. It introduces EnSToM, a lightweight activation steering method that extracts a steering vector from activations and applies an entropy-based scaling at inference to adapt steering strength to input uncertainty. Experimental results on CantTalkAboutThis show that EnSToM significantly improves distractor refusal while preserving on-topic engagement, with best overall accuracy around 0.8 and demonstrated generalization across models and domains. Layer-wise entropy analysis further provides insights into how intermediate layers process distractor versus on-topic inputs, supporting robust, low-data deployment of topic-maintenance in real-world dialogue systems.

Abstract

Small large language models (sLLMs) offer the advantage of being lightweight and efficient, which makes them suitable for resource-constrained environments. However, sLLMs often struggle to maintain topic consistency in task-oriented dialogue systems, which is critical for scenarios such as service chatbots. Specifically, it is important to ensure that the model denies off-topic or malicious inputs and adheres to its intended functionality so as to prevent potential misuse and uphold reliability. Towards this, existing activation engineering approaches have been proposed to manipulate internal activations during inference. While these methods are effective in certain scenarios, our preliminary experiments reveal their limitations in ensuring topic adherence. Therefore, to address this, we propose a novel approach termed Entropy-scaled Steering vectors for Topic Maintenance (EnSToM). EnSToM dynamically adjusts the steering intensity based on input uncertainty, which allows the model to handle off-topic distractors effectively while preserving on-topic accuracy. Our experiments demonstrate that EnSToM achieves significant performance gain with a relatively small data size compared to fine-tuning approaches. By improving topic adherence without compromising efficiency, our approach provides a robust solution for enhancing sLLM-based dialogue systems.

Paper Structure

This paper contains 39 sections, 6 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The example above illustrates that bots tend to provide only refusal responses when using vanilla steering to improve on-topic response generation. On the other hand, EnSToM is able to generate more contextually appropriate responses.
  • Figure 2: Overall process. After extracting steering vectors and applying entropy-based coefficient scaling, responses are generated using the entropy-based scaled steering vectors to maintain on-topic accuracy.
  • Figure 3: Comparison of entropy distribution in different layers of Llama-2-7b-chat.
  • Figure 4: Effect of entropy-based scaling at different thresholds $t$.
  • Figure 5: Entropy distribution of on-topic and distractor for jailbreak defense task at layer 33 of Ministral-8b-Instruct-2410 model.
  • ...and 1 more figures