Table of Contents
Fetching ...

Online Domain-aware LLM Decoding for Continual Domain Evolution

Mohammad Abu-Shaira, Weishi Shi

TL;DR

This paper tackles the challenge of continual domain evolution by proposing Online Domain-aware Decoding (ODD), an inference-time framework that avoids retraining. ODD fuses a fine-tuned base LLM with an online prefix-tree prior, using adaptive confidence signals from disagreement and temporal continuity to steer the interpolation between the two sources, all while maintaining near zero additional latency. The approach is validated on synthetic drift scenarios with a placeholder-rich dataset, where ODD delivers consistent improvements over LLM-Greedy and LLM-Temp Scaled, notably a ROUGE-L gain of 0.065 and a 13.6% relative rise in Cosine Similarity, demonstrating robustness to lexical and contextual drift. The work highlights practical implications for dynamic LLM deployment, enabling rapid, drift-aware adaptation without weight updates or external retrieval.

Abstract

LLMs are typically fine-tuned offline on domain-specific data, assuming a static domain. In practice, domain knowledge evolves continuously through new regulations, products, services, and interaction patterns. Retraining or fine-tuning LLMs for every new instance is computationally infeasible. Additionally, real-world environments also exhibit temporal dynamics with shifting data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. This mismatch between evolving domains and static adaptation pipelines highlights the need for efficient, real-time adaptation without costly retraining. In response, we introduce Online Domain-aware Decoding framework (ODD). ODD performs probability-level fusion between a base LLM and a prefix-tree prior, guided by adaptive confidence modulation using disagreement and continuity signals. Empirical evaluation under diverse drift scenarios demonstrates that ODD consistently surpasses LLM-Greedy and LLM-Temp Scaled across all syntactic and semantic NLG metrics. It yields an absolute ROUGE-L gain of 0.065 and a 13.6% relative improvement in Cosine Similarity over the best baseline. These results demonstrate ODD 's robustness to evolving lexical and contextual patterns, making it suitable for dynamic LLM applications.

Online Domain-aware LLM Decoding for Continual Domain Evolution

TL;DR

This paper tackles the challenge of continual domain evolution by proposing Online Domain-aware Decoding (ODD), an inference-time framework that avoids retraining. ODD fuses a fine-tuned base LLM with an online prefix-tree prior, using adaptive confidence signals from disagreement and temporal continuity to steer the interpolation between the two sources, all while maintaining near zero additional latency. The approach is validated on synthetic drift scenarios with a placeholder-rich dataset, where ODD delivers consistent improvements over LLM-Greedy and LLM-Temp Scaled, notably a ROUGE-L gain of 0.065 and a 13.6% relative rise in Cosine Similarity, demonstrating robustness to lexical and contextual drift. The work highlights practical implications for dynamic LLM deployment, enabling rapid, drift-aware adaptation without weight updates or external retrieval.

Abstract

LLMs are typically fine-tuned offline on domain-specific data, assuming a static domain. In practice, domain knowledge evolves continuously through new regulations, products, services, and interaction patterns. Retraining or fine-tuning LLMs for every new instance is computationally infeasible. Additionally, real-world environments also exhibit temporal dynamics with shifting data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. This mismatch between evolving domains and static adaptation pipelines highlights the need for efficient, real-time adaptation without costly retraining. In response, we introduce Online Domain-aware Decoding framework (ODD). ODD performs probability-level fusion between a base LLM and a prefix-tree prior, guided by adaptive confidence modulation using disagreement and continuity signals. Empirical evaluation under diverse drift scenarios demonstrates that ODD consistently surpasses LLM-Greedy and LLM-Temp Scaled across all syntactic and semantic NLG metrics. It yields an absolute ROUGE-L gain of 0.065 and a 13.6% relative improvement in Cosine Similarity over the best baseline. These results demonstrate ODD 's robustness to evolving lexical and contextual patterns, making it suitable for dynamic LLM applications.
Paper Structure (18 sections, 8 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 8 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: $N$-gram Trie with Frequency (${ \mathbf{F}}$), Length (${ \mathbf{L}}$), and Recency (${ \mathbf{R}}$).
  • Figure 2: NLG Metrics (Mean $\pm$ 95% CI) under abrupt drift
  • Figure 3: Concept drift in placeholders: (top) Abrupt, (middle) Incremental, (bottom) Gradual.