Table of Contents
Fetching ...

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

Ankit Sharma, Nachiket Tapas, Jyotiprakash Patra

TL;DR

This work introduces an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history, offering a scalable solution for reliable LLM deployment.

Abstract

Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off either a strict filtering mechanisms prevent harmful outputs but often block benign queries or a relaxed controls risk unsafe content generation. Conventional guardrails based on static rules or fixed confidence thresholds are typically context-insensitive and computationally expensive, resulting in high latency and degraded user experience. To address these limitations, we introduce an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history. The proposed framework integrates a multi-dimensional detection architecture composed of five parallel detectors, combined through a hierarchical cascade mechanism to optimize both speed and precision. The cascade design reduces unnecessary computation by progressively filtering queries, achieving substantial latency improvements compared to non-cascaded models and external guardrail systems. Extensive evaluation on mixed and domain-specific workloads demonstrates significant reductions in false positives, particularly in sensitive domains such as medical advice and creative writing. The system maintains high safety precision and near-perfect recall under strict operating modes. Overall, our context-aware abstention framework effectively balances safety and utility while preserving performance, offering a scalable solution for reliable LLM deployment.

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

TL;DR

This work introduces an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history, offering a scalable solution for reliable LLM deployment.

Abstract

Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off either a strict filtering mechanisms prevent harmful outputs but often block benign queries or a relaxed controls risk unsafe content generation. Conventional guardrails based on static rules or fixed confidence thresholds are typically context-insensitive and computationally expensive, resulting in high latency and degraded user experience. To address these limitations, we introduce an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history. The proposed framework integrates a multi-dimensional detection architecture composed of five parallel detectors, combined through a hierarchical cascade mechanism to optimize both speed and precision. The cascade design reduces unnecessary computation by progressively filtering queries, achieving substantial latency improvements compared to non-cascaded models and external guardrail systems. Extensive evaluation on mixed and domain-specific workloads demonstrates significant reductions in false positives, particularly in sensitive domains such as medical advice and creative writing. The system maintains high safety precision and near-perfect recall under strict operating modes. Overall, our context-aware abstention framework effectively balances safety and utility while preserving performance, offering a scalable solution for reliable LLM deployment.
Paper Structure (12 sections, 8 equations, 6 figures, 4 tables)

This paper contains 12 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Safety--utility trade-off. The adaptive abstention layer (teal) achieves a superior balance compared to static guardrails (slate) and confidence-based methods (blue).
  • Figure 2: Abstention engine core: input processing, parallel detection pipeline, and cascade stages.
  • Figure 3: Our cascade architecture achieves a $10\times$ speedup over external guardrails by filtering most queries on the fast path and reserving deep analysis for ambiguous cases.
  • Figure 4: Comparative analysis of Raw vs. Guarded model performance. The Abstention Layer (teal) significantly reduces unsafe responses compared to raw models (purple), particularly for unknown and harmful queries, with a +40% safety filtering impact.
  • Figure 5: False positive rate by domain under static vs. adaptive thresholding. Adaptive thresholding significantly reduces over-refusal in Creative Writing and Medical contexts.
  • ...and 1 more figures