EnSToM: Enhancing Dialogue Systems with Entropy-Scaled Steering Vectors for Topic Maintenance
Heejae Suh, Yejin Jeon, Deokhyung Kang, Taehee Park, Yejin Min, Gary Geunbae Lee
TL;DR
The paper addresses the challenge of maintaining topic adherence in task-oriented dialogue with resource-efficient small LLMs, where off-topic or unsafe inputs can degrade user experience. It introduces EnSToM, a lightweight activation steering method that extracts a steering vector from activations and applies an entropy-based scaling at inference to adapt steering strength to input uncertainty. Experimental results on CantTalkAboutThis show that EnSToM significantly improves distractor refusal while preserving on-topic engagement, with best overall accuracy around 0.8 and demonstrated generalization across models and domains. Layer-wise entropy analysis further provides insights into how intermediate layers process distractor versus on-topic inputs, supporting robust, low-data deployment of topic-maintenance in real-world dialogue systems.
Abstract
Small large language models (sLLMs) offer the advantage of being lightweight and efficient, which makes them suitable for resource-constrained environments. However, sLLMs often struggle to maintain topic consistency in task-oriented dialogue systems, which is critical for scenarios such as service chatbots. Specifically, it is important to ensure that the model denies off-topic or malicious inputs and adheres to its intended functionality so as to prevent potential misuse and uphold reliability. Towards this, existing activation engineering approaches have been proposed to manipulate internal activations during inference. While these methods are effective in certain scenarios, our preliminary experiments reveal their limitations in ensuring topic adherence. Therefore, to address this, we propose a novel approach termed Entropy-scaled Steering vectors for Topic Maintenance (EnSToM). EnSToM dynamically adjusts the steering intensity based on input uncertainty, which allows the model to handle off-topic distractors effectively while preserving on-topic accuracy. Our experiments demonstrate that EnSToM achieves significant performance gain with a relatively small data size compared to fine-tuning approaches. By improving topic adherence without compromising efficiency, our approach provides a robust solution for enhancing sLLM-based dialogue systems.
