Table of Contents
Fetching ...

Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs

Shuyang Yu, Runxue Bao, Parminder Bhatia, Taha Kass-Hout, Jiayu Zhou, Cao Xiao

TL;DR

The paper tackles the instability of retrieval-augmented ICL for long-tail knowledge by introducing a reinforcement-learning-based dynamic uncertainty ranking that reorders retrieved samples according to their per-sample impact on LLM predictions. A learnable budget threshold $\sigma$ reduces query cost by selectively updating the retriever, while a policy-gradient objective guides the retriever to elevate informative samples and suppress misleading ones. Across five QA datasets with GPT-4, the method achieves consistent improvements over strong baselines, particularly boosting long-tail question accuracy by up to $5.96\%$ and averaging $2.97\%$ overall. The approach also demonstrates good efficiency and transferability, suggesting practical applicability for cost-conscious, cross-domain retrieval-augmented ICL in real-world systems.

Abstract

Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training. However, long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization. Prior work has shown that in-context learning (ICL) with retriever augmentation can help LLMs better capture long-tail knowledge, reducing their reliance on pre-trained data. Despite these advances, we observe that LLM predictions for long-tail questions remain uncertain to variations in retrieved samples. To take advantage of the uncertainty in ICL for guiding LLM predictions toward correct answers on long-tail samples, we propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions. Our approach prioritizes more informative and stable samples while demoting misleading ones, updating rankings based on the feedback from the LLM w.r.t. each retrieved sample. To enhance training efficiency and reduce query costs, we introduce a learnable dynamic ranking threshold, adjusted when the model encounters negative prediction shifts. Experimental results on various question-answering datasets from different domains show that our method outperforms the best baseline by $2.76\%$, with a notable $5.96\%$ boost in accuracy on long-tail questions that elude zero-shot inference.

Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs

TL;DR

The paper tackles the instability of retrieval-augmented ICL for long-tail knowledge by introducing a reinforcement-learning-based dynamic uncertainty ranking that reorders retrieved samples according to their per-sample impact on LLM predictions. A learnable budget threshold reduces query cost by selectively updating the retriever, while a policy-gradient objective guides the retriever to elevate informative samples and suppress misleading ones. Across five QA datasets with GPT-4, the method achieves consistent improvements over strong baselines, particularly boosting long-tail question accuracy by up to and averaging overall. The approach also demonstrates good efficiency and transferability, suggesting practical applicability for cost-conscious, cross-domain retrieval-augmented ICL in real-world systems.

Abstract

Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training. However, long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization. Prior work has shown that in-context learning (ICL) with retriever augmentation can help LLMs better capture long-tail knowledge, reducing their reliance on pre-trained data. Despite these advances, we observe that LLM predictions for long-tail questions remain uncertain to variations in retrieved samples. To take advantage of the uncertainty in ICL for guiding LLM predictions toward correct answers on long-tail samples, we propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions. Our approach prioritizes more informative and stable samples while demoting misleading ones, updating rankings based on the feedback from the LLM w.r.t. each retrieved sample. To enhance training efficiency and reduce query costs, we introduce a learnable dynamic ranking threshold, adjusted when the model encounters negative prediction shifts. Experimental results on various question-answering datasets from different domains show that our method outperforms the best baseline by , with a notable boost in accuracy on long-tail questions that elude zero-shot inference.

Paper Structure

This paper contains 19 sections, 10 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Training framework of the proposed method. After pre-selection using BM25 for each validation sample $p_i$, we conduct from $0$-shot to $k_i$-shot inference and update retriever $S_\theta$ according to the dynamic impacts of each sample on LLMs based on the reward from LLM. To reduce the query cost, we update the threshold $\sigma$ when the LLM experiences a negative prediction change. The query time $k_i$ is decided by retriever score $S_\theta$ and threshold $\sigma$.
  • Figure 2: Case study for uncertainty of ICL.
  • Figure 3: Uncertain sample ratios.
  • Figure 4: Accuracy on easy and hard samples for proposed method and baselines.
  • Figure 5: Effects of different number of shots.
  • ...and 4 more figures