Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks
Yifan Wang, Qingyan Guo, Xinzhe Ni, Chufan Shi, Lemao Liu, Haiyun Jiang, Yujiu Yang
TL;DR
The paper tackles the limited effectiveness of in-context learning for knowledge-intensive tasks by introducing Hint-enhanced In-Context Learning (HICL), which extracts query-related hints from demonstrations and prepends them to prompts. It also trains a Hint-related Example Retriever (HER) using contrastive learning to select exemplars that maximize hint usefulness, enabling improved demonstrations even with Black-Box LLMs. Empirically, HICL yields consistent gains across three open-domain QA benchmarks on both GPT-3.5-turbo and LLaMA-2-Chat-7B, with notable improvements in EM and F1 scores. The approach demonstrates that explicit hints and targeted exemplar retrieval can significantly bolster reasoning and retrieval in LLMs, aligning with retrieval-augmented paradigms and offering a practical path for knowledge-intensive tasks.
Abstract
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs), enabling them to learn input-label mappings from demonstrations and perform well on downstream tasks. However, under the standard ICL setting, LLMs may sometimes neglect query-related information in demonstrations, leading to incorrect predictions. To address this limitation, we propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering, an important form in knowledge-intensive tasks. HICL leverages LLMs' reasoning ability to extract query-related knowledge from demonstrations, then concatenates the knowledge to prompt LLMs in a more explicit way. Furthermore, we track the source of this knowledge to identify specific examples, and introduce a Hint-related Example Retriever (HER) to select informative examples for enhanced demonstrations. We evaluate HICL with HER on 3 open-domain QA benchmarks, and observe average performance gains of 2.89 EM score and 2.52 F1 score on gpt-3.5-turbo, 7.62 EM score and 7.27 F1 score on LLaMA-2-Chat-7B compared with standard setting.
