TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
Jamshid Mozafari, Anubhav Jangra, Adam Jatowt
TL;DR
TriviaHG tackles the risk that direct answers from LLMs can erode human reasoning by proposing hint-based guidance for factoid questions. It introduces a two-module pipeline to construct a large-scale TriviaHG dataset (16,645 questions, 160,230 hints) and pairs it with automatic evaluation metrics for convergence (HICOS) and familiarity (HIFAS). Empirical results show hints can effectively aid users in finding answers, with performance depending on question difficulty, and demonstrate strong alignment between automatic metrics and human judgments. The work enables targeted fine-tuning of generative models and has practical implications for retrieval-augmented generation, query expansion, and educational tooling by providing high-signal hints rather than direct solutions.
Abstract
Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human's cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints. The effectiveness of hints varied, with success rates of 96%, 78%, and 36% for questions with easy, medium, and hard answers, respectively. Moreover, the proposed automatic evaluation methods showed a robust correlation with annotators' results. Conclusively, the findings highlight three key insights: the facilitative role of hints in resolving unknown questions, the dependence of hint quality on answer difficulty, and the feasibility of employing automatic evaluation methods for hint assessment.
