Table of Contents
Fetching ...

Exploring Hint Generation Approaches in Open-Domain Question Answering

Jamshid Mozafari, Abdelrahman Abdallah, Bhawna Piryani, Adam Jatowt

TL;DR

HintQA reframes context construction for open-domain QA by generating multiple hints per question rather than relying on retrieved passages or minimally generated text. It formalizes a hint-centric context using a convergence score $HICOS$ to rank hints, concatenates top hints, and feeds them to a Reader. Across TriviaQA, Natural Questions, and WebQ, HintQA—especially with finetuned HG ($HiGen$-FT) and convergence-based reranking—consistently outperforms retrieval- and generation-based baselines, with strong gains in few-shot regimes. The approach highlights the practical potential of structured hints to guide readers and suggests avenues for improving HG cores and reranking, while noting limitations around data freshness and compute costs.

Abstract

Automatic Question Answering (QA) systems rely on contextual information to provide accurate answers. Commonly, contexts are prepared through either retrieval-based or generation-based methods. The former involves retrieving relevant documents from a corpus like Wikipedia, whereas the latter uses generative models such as Large Language Models (LLMs) to generate the context. In this paper, we introduce a novel context preparation approach called HINTQA, which employs Automatic Hint Generation (HG) techniques. Unlike traditional methods, HINTQA prompts LLMs to produce hints about potential answers for the question rather than generating relevant context. We evaluate our approach across three QA datasets including TriviaQA, NaturalQuestions, and Web Questions, examining how the number and order of hints impact performance. Our findings show that the HINTQA surpasses both retrieval-based and generation-based approaches. We demonstrate that hints enhance the accuracy of answers more than retrieved and generated contexts.

Exploring Hint Generation Approaches in Open-Domain Question Answering

TL;DR

HintQA reframes context construction for open-domain QA by generating multiple hints per question rather than relying on retrieved passages or minimally generated text. It formalizes a hint-centric context using a convergence score to rank hints, concatenates top hints, and feeds them to a Reader. Across TriviaQA, Natural Questions, and WebQ, HintQA—especially with finetuned HG (-FT) and convergence-based reranking—consistently outperforms retrieval- and generation-based baselines, with strong gains in few-shot regimes. The approach highlights the practical potential of structured hints to guide readers and suggests avenues for improving HG cores and reranking, while noting limitations around data freshness and compute costs.

Abstract

Automatic Question Answering (QA) systems rely on contextual information to provide accurate answers. Commonly, contexts are prepared through either retrieval-based or generation-based methods. The former involves retrieving relevant documents from a corpus like Wikipedia, whereas the latter uses generative models such as Large Language Models (LLMs) to generate the context. In this paper, we introduce a novel context preparation approach called HINTQA, which employs Automatic Hint Generation (HG) techniques. Unlike traditional methods, HINTQA prompts LLMs to produce hints about potential answers for the question rather than generating relevant context. We evaluate our approach across three QA datasets including TriviaQA, NaturalQuestions, and Web Questions, examining how the number and order of hints impact performance. Our findings show that the HINTQA surpasses both retrieval-based and generation-based approaches. We demonstrate that hints enhance the accuracy of answers more than retrieved and generated contexts.
Paper Structure (24 sections, 4 equations, 4 figures, 40 tables)

This paper contains 24 sections, 4 equations, 4 figures, 40 tables.

Figures (4)

  • Figure 1: Example of generated hints, context produced by LLaMA-70, and a passage retrieved by MSS-DPR for a TriviaQA sample question, with convergence score (HICOS) ranging from 0 (lowest) to 1 (highest). Words in blue indicate the correct answer, while those in red represent other potential answers.
  • Figure 2: The HintQA approach, where $H_i$ denotes the $i$th hint. Initially, the Hint Generation component produces hints for the given question. These hints are then reranked and concatenated to form a context, which is subsequently passed to the Reader component to identify the answer of the question.
  • Figure 3: Accuracy results for 200 random questions from TriviaQA, NQ, and WebQ when using LLaMA-7b as the Reader and varying the numbers of context sentences. The context sentences are obtained by (a) Retrieval-based (DPR), (b) Generation-based (LLaMA-70b), and (c) Hint-Generation (HiGen-FT) methods. The blue (red) columns indicate the accuracy when the total number of potential entities across sentences is at its minimum (maximum). The number of potential entities per sentence is calculated using HICOS approach mozafari2024triviahg.
  • Figure 4: Exact Match values for TriviaQA, NQ, and WebQ datasets categorized by question type, based on the optimal settings for both HiGen-Va and HiGen-FT using few-shot learning on LLaMA-7b.