Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database
Ye Liu, Semih Yavuz, Rui Meng, Dragomir Radev, Caiming Xiong, Yingbo Zhou
TL;DR
Uni-Parser addresses the challenge of answering questions over both knowledge bases and databases by avoiding exponential candidate space through primitive-based enumeration and a three-stage pipeline (Enumeration, Ranker, Generator). It introduces two primitive categories per modality and enables cross-modality generalization via a generator that composes primitives with operations, aided by contrastive learning and hard negative sampling. Empirical results on GrailQA, WebQSP, Spider, and WikiSQL show competitive or state-of-the-art performance, with notable gains in compositional and zero-shot settings and improved efficiency. The approach offers scalable, interpretable QA over heterogeneous structured data.
Abstract
Parsing natural language questions into executable logical forms is a useful and interpretable way to perform question answering on structured data such as knowledge bases (KB) or databases (DB). However, existing approaches on semantic parsing cannot adapt to both modalities, as they suffer from the exponential growth of the logical form candidates and can hardly generalize to unseen data. In this work, we propose Uni-Parser, a unified semantic parser for question answering (QA) on both KB and DB. We introduce the primitive (relation and entity in KB, and table name, column name and cell value in DB) as an essential element in our framework. The number of primitives grows linearly with the number of retrieved relations in KB and DB, preventing us from dealing with exponential logic form candidates. We leverage the generator to predict final logical forms by altering and composing topranked primitives with different operations (e.g. select, where, count). With sufficiently pruned search space by a contrastive primitive ranker, the generator is empowered to capture the composition of primitives enhancing its generalization ability. We achieve competitive results on multiple KB and DB QA benchmarks more efficiently, especially in the compositional and zero-shot settings.
