Table of Contents
Fetching ...

Code-Style In-Context Learning for Knowledge-Based Question Answering

Zhijie Nie, Richong Zhang, Zhongyuan Wang, Xudong Liu

TL;DR

The paper tackles knowledge-based question answering under limited labeled data and formatting fragility in LLM-generated logic forms. It introduces KB-Coder, a training-free approach that reframes logic-form generation as Python function-call sequences defined by seven meta-functions, with retrieval and a program interpreter producing answers via SPARQL. The method achieves state-of-the-art performance in few-shot settings on WebQSP, GrailQA, and GraphQ, while offering competitive results against fully supervised baselines when data is abundant, and it improves zero-shot generalization by leveraging a related reference relation. Overall, KB-Coder provides a training-free, interpretable, and practical KBQA framework that reduces formatting errors and supports rapid deployment across domains.

Abstract

Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their labeled logical forms as demo examples, LLMs can understand the task intent and generate the logic form for a new question. However, current powerful LLMs have little exposure to logic forms during pre-training, resulting in a high format error rate. To solve this problem, we propose a code-style in-context learning method for KBQA, which converts the generation process of unfamiliar logical form into the more familiar code generation process for LLMs. Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the few-shot setting. The code and supplementary files are released at https://github.com/Arthurizijar/KB-Coder .

Code-Style In-Context Learning for Knowledge-Based Question Answering

TL;DR

The paper tackles knowledge-based question answering under limited labeled data and formatting fragility in LLM-generated logic forms. It introduces KB-Coder, a training-free approach that reframes logic-form generation as Python function-call sequences defined by seven meta-functions, with retrieval and a program interpreter producing answers via SPARQL. The method achieves state-of-the-art performance in few-shot settings on WebQSP, GrailQA, and GraphQ, while offering competitive results against fully supervised baselines when data is abundant, and it improves zero-shot generalization by leveraging a related reference relation. Overall, KB-Coder provides a training-free, interpretable, and practical KBQA framework that reduces formatting errors and supports rapid deployment across domains.

Abstract

Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their labeled logical forms as demo examples, LLMs can understand the task intent and generate the logic form for a new question. However, current powerful LLMs have little exposure to logic forms during pre-training, resulting in a high format error rate. To solve this problem, we propose a code-style in-context learning method for KBQA, which converts the generation process of unfamiliar logical form into the more familiar code generation process for LLMs. Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the few-shot setting. The code and supplementary files are released at https://github.com/Arthurizijar/KB-Coder .
Paper Structure (40 sections, 1 equation, 5 figures, 6 tables)

This paper contains 40 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A comparison between our proposed ICL method and the existing method. Intuitively, our method achieves better performance by transforming the original KBQA task into a more familiar code form for the LLM.
  • Figure 2: The tree structure (left) and the corresponding function call sequence (right) of S-Expression (COUNT (AND (JOIN nationality m.09c7w0) (JOIN profession m.015cjr))).
  • Figure 3: An illustration of the inference process of KB-Coder.
  • Figure 4: Effect analysis of the three factors in ICL with a subset of 500 questions from GrailQA local dev set, where the solid line is used to indicate F1 Score and the dashed line is used to indicate FER.
  • Figure 5: Ablation Study on GrailQA.