HILL: A Hallucination Identifier for Large Language Models
Florian Leiser, Sven Eckhardt, Valentin Leuthe, Merlin Knaeble, Alexander Maedche, Gerhard Schwabe, Ali Sunyaev
TL;DR
The paper addresses hallucinations in large language models and user overreliance by introducing HILL, a Hallucination Identifier designed through a user-centered Wizard of Oz process and implemented as a web-based interface interfacing with the OpenAI API. It combines a structured feature prioritization (via nine WOz sessions and best-worst scaling) with a functional artifact that presents a confidence score, source links, and a drill-down dashboard to help users identify and question potentially hallucinated content. Evaluation with 17 online participants, a 128-question SQuAD 2.0 test, and five user interviews suggests that HILL can effectively highlight hallucinations in wrong answers and support users in safer information consumption, while also highlighting risks of overreliance when hallucinations are missed. The work demonstrates the feasibility and value of user-centered AI artifacts that empower users to detect errors, offering a practical path for integrating such designs alongside technical mitigation approaches in real-world LLM deployments.
Abstract
Large language models (LLMs) are prone to hallucinations, i.e., nonsensical, unfaithful, and undesirable text. Users tend to overrely on LLMs and corresponding hallucinations which can lead to misinterpretations and errors. To tackle the problem of overreliance, we propose HILL, the "Hallucination Identifier for Large Language Models". First, we identified design features for HILL with a Wizard of Oz approach with nine participants. Subsequently, we implemented HILL based on the identified design features and evaluated HILL's interface design by surveying 17 participants. Further, we investigated HILL's functionality to identify hallucinations based on an existing question-answering dataset and five user interviews. We find that HILL can correctly identify and highlight hallucinations in LLM responses which enables users to handle LLM responses with more caution. With that, we propose an easy-to-implement adaptation to existing LLMs and demonstrate the relevance of user-centered designs of AI artifacts.
