CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models
Minjung Shin, Donghyun Kim, Jeh-Kwang Ryu
TL;DR
This work introduces the CAUS dataset to study question generation by Large Language Models under epistemic uncertainty, aiming to mimic human cognitive questioning. It combines scene Description generation with reasoning and five-question querying, followed by two-dimensional K-type and Q-type classifications, validated against human ground truth. The GPT-4-0613 driven pipeline demonstrates that LLMs can identify uncertainties and generate diverse, relevant questions with high classification reliability. The findings suggest that incorporating human-like questioning strategies can enhance AI's handling of uncertainty and point to model-agnostic prompting and future work incorporating social-pragmatic factors.
Abstract
We introduce the Curious About Uncertain Scene (CAUS) dataset, designed to enable Large Language Models, specifically GPT-4, to emulate human cognitive processes for resolving uncertainties. Leveraging this dataset, we investigate the potential of LLMs to engage in questioning effectively. Our approach involves providing scene descriptions embedded with uncertainties to stimulate the generation of reasoning and queries. The queries are then classified according to multi-dimensional criteria. All procedures are facilitated by a collaborative system involving both LLMs and human researchers. Our results demonstrate that GPT-4 can effectively generate pertinent questions and grasp their nuances, particularly when given appropriate context and instructions. The study suggests that incorporating human-like questioning into AI models improves their ability to manage uncertainties, paving the way for future advancements in Artificial Intelligence (AI).
