Handling Ontology Gaps in Semantic Parsing
Andrea Bacciu, Marco Damonte, Marco Basaldella, Emilio Monti
TL;DR
The paper tackles the risk of hallucinations in neural semantic parsing under a closed-world assumption, where queries requiring unseen ontology symbols can yield wrong or unsafe answers. It introduces the Hallucination Simulation Framework (HSF) to programmatically induce and study NSP hallucinations, and a Hallucination Detection Model (HDM) that leverages multiple signals—Confidence Score, Monte Carlo Dropout, and model activations—to detect and prevent hallucinated MRLs at inference time. A Hallucination Detection Dataset (HDD) is constructed by partitioning ontology into known and unknown symbol sets and incorporating out-of-domain and zero-shot cases, enabling robust evaluation across in-ontology errors, ontology gaps, and OOD inputs. Experiments on the KQA-PRO dataset show that combining Activations with CS (and optionally MCD) within the HDM yields the largest Macro F1 improvements, notably up to ~21% for ontology gaps and ~24% for OOD detection, with modest gains for NSP errors. The work provides a first systematic treatment of ontology gaps in closed-ontology NSP and demonstrates a practical, low-latency strategy to enhance the trustworthiness of NSP-based QA systems.
Abstract
The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a mechanism to prevent this behavior is crucial to build trusted NSP-based Question Answering agents. To that end, we propose the Hallucination Simulation Framework (HSF), a general setting for stimulating and analyzing NSP model hallucinations. The framework can be applied to any NSP task with a closed-ontology. Using the proposed framework and KQA Pro as the benchmark dataset, we assess state-of-the-art techniques for hallucination detection. We then present a novel hallucination detection strategy that exploits the computational graph of the NSP model to detect the NSP hallucinations in the presence of ontology gaps, out-of-domain utterances, and to recognize NSP errors, improving the F1-Score respectively by ~21, ~24% and ~1%. This is the first work in closed-ontology NSP that addresses the problem of recognizing ontology gaps. We release our code and checkpoints at https://github.com/amazon-science/handling-ontology-gaps-in-semantic-parsing.
