A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering
Lingxi Zhang, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen
TL;DR
The paper tackles generalization in knowledge base question answering (KBQA), addressing the limitation of retrieve-then-reason pipelines which do not update model parameters with new KB knowledge. It introduces KBLLaMA, a learn-then-reason framework that fine-tunes LLaMA2-7B to map natural questions to executable logical expressions while embedding unseen KB knowledge into the model. Knowledge is organized as diverse <question, logical_expression> training pairs, generated via cluster-based relation selection and GPT-3.5, then refined with KB-aware explanations and entity mentions, enabling end-to-end reasoning without external retrievers. Empirical results show state-of-the-art performance on GrailQA and Bio-chemical benchmarks, with notable gains in cross-KB generalization, and a strong entity linking component, highlighting the practical impact for domain-specific KBQA and flexible knowledge integration.
Abstract
Large-scale knowledge bases (KBs) like Freebase and Wikidata house millions of structured knowledge. Knowledge Base Question Answering (KBQA) provides a user-friendly way to access these valuable KBs via asking natural language questions. In order to improve the generalization capabilities of KBQA models, extensive research has embraced a retrieve-then-reason framework to retrieve relevant evidence for logical expression generation. These multi-stage efforts prioritize acquiring external sources but overlook the incorporation of new knowledge into their model parameters. In effect, even advanced language models and retrievers have knowledge boundaries, thereby limiting the generalization capabilities of previous KBQA models. Therefore, this paper develops KBLLaMA, which follows a learn-then-reason framework to inject new KB knowledge into a large language model for flexible end-to-end KBQA. At the core of KBLLaMA, we study (1) how to organize new knowledge about KBQA and (2) how to facilitate the learning of the organized knowledge. Extensive experiments on various KBQA generalization tasks showcase the state-of-the-art performance of KBLLaMA. Especially on the general benchmark GrailQA and domain-specific benchmark Bio-chemical, KBLLaMA respectively derives a performance gain of up to 3.8% and 9.8% compared to the baselines.
