Table of Contents
Fetching ...

A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering

Lingxi Zhang, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen

TL;DR

The paper tackles generalization in knowledge base question answering (KBQA), addressing the limitation of retrieve-then-reason pipelines which do not update model parameters with new KB knowledge. It introduces KBLLaMA, a learn-then-reason framework that fine-tunes LLaMA2-7B to map natural questions to executable logical expressions while embedding unseen KB knowledge into the model. Knowledge is organized as diverse <question, logical_expression> training pairs, generated via cluster-based relation selection and GPT-3.5, then refined with KB-aware explanations and entity mentions, enabling end-to-end reasoning without external retrievers. Empirical results show state-of-the-art performance on GrailQA and Bio-chemical benchmarks, with notable gains in cross-KB generalization, and a strong entity linking component, highlighting the practical impact for domain-specific KBQA and flexible knowledge integration.

Abstract

Large-scale knowledge bases (KBs) like Freebase and Wikidata house millions of structured knowledge. Knowledge Base Question Answering (KBQA) provides a user-friendly way to access these valuable KBs via asking natural language questions. In order to improve the generalization capabilities of KBQA models, extensive research has embraced a retrieve-then-reason framework to retrieve relevant evidence for logical expression generation. These multi-stage efforts prioritize acquiring external sources but overlook the incorporation of new knowledge into their model parameters. In effect, even advanced language models and retrievers have knowledge boundaries, thereby limiting the generalization capabilities of previous KBQA models. Therefore, this paper develops KBLLaMA, which follows a learn-then-reason framework to inject new KB knowledge into a large language model for flexible end-to-end KBQA. At the core of KBLLaMA, we study (1) how to organize new knowledge about KBQA and (2) how to facilitate the learning of the organized knowledge. Extensive experiments on various KBQA generalization tasks showcase the state-of-the-art performance of KBLLaMA. Especially on the general benchmark GrailQA and domain-specific benchmark Bio-chemical, KBLLaMA respectively derives a performance gain of up to 3.8% and 9.8% compared to the baselines.

A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering

TL;DR

The paper tackles generalization in knowledge base question answering (KBQA), addressing the limitation of retrieve-then-reason pipelines which do not update model parameters with new KB knowledge. It introduces KBLLaMA, a learn-then-reason framework that fine-tunes LLaMA2-7B to map natural questions to executable logical expressions while embedding unseen KB knowledge into the model. Knowledge is organized as diverse <question, logical_expression> training pairs, generated via cluster-based relation selection and GPT-3.5, then refined with KB-aware explanations and entity mentions, enabling end-to-end reasoning without external retrievers. Empirical results show state-of-the-art performance on GrailQA and Bio-chemical benchmarks, with notable gains in cross-KB generalization, and a strong entity linking component, highlighting the practical impact for domain-specific KBQA and flexible knowledge integration.

Abstract

Large-scale knowledge bases (KBs) like Freebase and Wikidata house millions of structured knowledge. Knowledge Base Question Answering (KBQA) provides a user-friendly way to access these valuable KBs via asking natural language questions. In order to improve the generalization capabilities of KBQA models, extensive research has embraced a retrieve-then-reason framework to retrieve relevant evidence for logical expression generation. These multi-stage efforts prioritize acquiring external sources but overlook the incorporation of new knowledge into their model parameters. In effect, even advanced language models and retrievers have knowledge boundaries, thereby limiting the generalization capabilities of previous KBQA models. Therefore, this paper develops KBLLaMA, which follows a learn-then-reason framework to inject new KB knowledge into a large language model for flexible end-to-end KBQA. At the core of KBLLaMA, we study (1) how to organize new knowledge about KBQA and (2) how to facilitate the learning of the organized knowledge. Extensive experiments on various KBQA generalization tasks showcase the state-of-the-art performance of KBLLaMA. Especially on the general benchmark GrailQA and domain-specific benchmark Bio-chemical, KBLLaMA respectively derives a performance gain of up to 3.8% and 9.8% compared to the baselines.
Paper Structure (22 sections, 3 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Approaches towards generalization in KBQA. (a) Examples of In-KB and Cross-KB generalization. The new KB knowledge is colored in purple. (b) Comparison between the traditional retrieve-then-reason approach and our proposed learn-then-reason approach. The former overlooks the incorporation of new KB knowledge into their model parameters, thereby limiting their generalization capabilities.
  • Figure 2: The overview of the development of KBLLaMA.
  • Figure 3: Effect of augmented data amount. We evaluate the performance of KBLLaMA-base on GrailQA (All) and the transfer performance of KBLLaMA-based on MQA.