Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

Ankit Sharma; Nachiket Tapas; Jyotiprakash Patra

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

Ankit Sharma, Nachiket Tapas, Jyotiprakash Patra

TL;DR

This work introduces an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history, offering a scalable solution for reliable LLM deployment.

Abstract

Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off either a strict filtering mechanisms prevent harmful outputs but often block benign queries or a relaxed controls risk unsafe content generation. Conventional guardrails based on static rules or fixed confidence thresholds are typically context-insensitive and computationally expensive, resulting in high latency and degraded user experience. To address these limitations, we introduce an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals such as domain and user history. The proposed framework integrates a multi-dimensional detection architecture composed of five parallel detectors, combined through a hierarchical cascade mechanism to optimize both speed and precision. The cascade design reduces unnecessary computation by progressively filtering queries, achieving substantial latency improvements compared to non-cascaded models and external guardrail systems. Extensive evaluation on mixed and domain-specific workloads demonstrates significant reductions in false positives, particularly in sensitive domains such as medical advice and creative writing. The system maintains high safety precision and near-perfect recall under strict operating modes. Overall, our context-aware abstention framework effectively balances safety and utility while preserving performance, offering a scalable solution for reliable LLM deployment.

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

TL;DR

Abstract

Paper Structure (12 sections, 8 equations, 6 figures, 4 tables)

This paper contains 12 sections, 8 equations, 6 figures, 4 tables.

Introduction
Related Work
System Architecture
Methodology
Experiments and Results
Conclusion
Implementation Details
Detector Configuration
Cascade Exit Rates
Validation Results
10-Trial Validation Summary
Graphs and Visualizations

Figures (6)

Figure 1: Safety--utility trade-off. The adaptive abstention layer (teal) achieves a superior balance compared to static guardrails (slate) and confidence-based methods (blue).
Figure 2: Abstention engine core: input processing, parallel detection pipeline, and cascade stages.
Figure 3: Our cascade architecture achieves a $10\times$ speedup over external guardrails by filtering most queries on the fast path and reserving deep analysis for ambiguous cases.
Figure 4: Comparative analysis of Raw vs. Guarded model performance. The Abstention Layer (teal) significantly reduces unsafe responses compared to raw models (purple), particularly for unknown and harmful queries, with a +40% safety filtering impact.
Figure 5: False positive rate by domain under static vs. adaptive thresholding. Adaptive thresholding significantly reduces over-refusal in Creative Writing and Medical contexts.
...and 1 more figures

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

TL;DR

Abstract

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)