Table of Contents
Fetching ...

KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Jiajie Zhang, Shulin Cao, Linmei Hu, Ling Feng, Lei Hou, Juanzi Li

TL;DR

KB-Plugin presents a plug-and-play approach that decouples KB schema knowledge from program induction by injecting a schema plugin and a transferable PI plugin into a lightweight LLM. The schema plugin is learned via self-supervised triple completion, encoding rich schema information, while the PI plugin is trained on multiple alias-augmented source KBs to generalize to unseen schemas. Through a four-step transfer process, the PI plugin is then applied to target low-resource KBs with a target schema plugin and constrained decoding to ensure valid programs. Experiments across five heterogeneous KBQA datasets show that this method achieves competitive or superior performance with a 25x smaller backbone LLM, and even surpasses some supervised baselines, highlighting strong cross-domain transfer and efficiency. The work advances practical KB reasoning for knowledge-intensive questions in low-resource settings.

Abstract

Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25$\times$ smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods. Our code and data are available at https://github.com/THU-KEG/KB-Plugin.

KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

TL;DR

KB-Plugin presents a plug-and-play approach that decouples KB schema knowledge from program induction by injecting a schema plugin and a transferable PI plugin into a lightweight LLM. The schema plugin is learned via self-supervised triple completion, encoding rich schema information, while the PI plugin is trained on multiple alias-augmented source KBs to generalize to unseen schemas. Through a four-step transfer process, the PI plugin is then applied to target low-resource KBs with a target schema plugin and constrained decoding to ensure valid programs. Experiments across five heterogeneous KBQA datasets show that this method achieves competitive or superior performance with a 25x smaller backbone LLM, and even surpasses some supervised baselines, highlighting strong cross-domain transfer and efficiency. The work advances practical KB reasoning for knowledge-intensive questions in low-resource settings.

Abstract

Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25 smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods. Our code and data are available at https://github.com/THU-KEG/KB-Plugin.
Paper Structure (28 sections, 4 equations, 3 figures, 8 tables)

This paper contains 28 sections, 4 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Illustration of KB-Plugin. By simply plugging the schema plugin of a KB and the PI plugin, the LLM is injected with the schema information of this KB and the ability to induce programs over it.
  • Figure 2: Overview of our plugin learning and transfer framework: (a) Generate multiple source KBs with different schemas and augmented source domain data via alias replacement; (b) Learn an individual schema plugin for each source KB and the target KB via self-supervised schema-relevant triple completion task; (c) Train the PI plugin by inducing program for each source KB when plugging it into the LLM along with the corresponding schema plugin. (d) Transfer the PI plugin by plugging it into the LLM with the schema plugin of the target KB and inducing programs over the target KB with constrained decoding.
  • Figure 3: KB-Plugin performance with different numbers of generated source KBs.