Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering
Daehwan Nam, Gary Geunbae Lee
TL;DR
This paper tackles semantic parsing for knowledge-base QA (KBQA) over large KBs by introducing a grammar augmented with candidate expressions. The approach combines type-driven constraints (sub-type inference and union types) with domain-specific candidate KB elements, implemented via tries and a mask-caching constrained decoding mechanism to achieve high accuracy and fast decoding. Empirical results on KQA Pro and Overnight show state-of-the-art performance under both strong and weak supervision, with substantial speedups in decoding. The method offers flexibility to scale to domain-specific KBs and hints at further gains when paired with advanced retrieval strategies.
Abstract
Semantic parsers convert natural language to logical forms, which can be evaluated on knowledge bases (KBs) to produce denotations. Recent semantic parsers have been developed with sequence-to-sequence (seq2seq) pre-trained language models (PLMs) or large language models, where the models treat logical forms as sequences of tokens. For syntactic and semantic validity, the semantic parsers use grammars that enable constrained decoding. However, the grammars lack the ability to utilize large information of KBs, although logical forms contain representations of KB elements, such as entities or relations. In this work, we propose a grammar augmented with candidate expressions for semantic parsing on a large KB with a seq2seq PLM. The grammar defines actions as production rules, and our semantic parser predicts actions during inference under the constraints by types and candidate expressions. We apply the grammar to knowledge base question answering, where the constraints by candidate expressions assist a semantic parser to generate valid KB elements. We also introduce two special rules, sub-type inference and union types, and a mask caching algorithm. In particular, sub-type inference and the mask caching algorithm greatly increase the decoding speed of our semantic parser. We experimented on two benchmarks, KQA Pro and Overnight, where the constraints by candidate expressions increased the accuracy of our semantic parser, whether it was trained with strong supervision or weak supervision. In addition, our semantic parser had a fast decoding speed in the experiments. Our source code is publicly available at https://github.com/daehwannam/candexpr-sp.git.
