How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

Jinxin Liu; Shulin Cao; Jiaxin Shi; Tingjian Zhang; Lunyiu Nie; Linmei Hu; Lei Hou; Juanzi Li

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

Jinxin Liu, Shulin Cao, Jiaxin Shi, Tingjian Zhang, Lunyiu Nie, Linmei Hu, Lei Hou, Juanzi Li

TL;DR

The paper addresses how proficient large language models are at handling formal languages used in semantic parsing for KBQA, revealing substantial differences across languages. It introduces two probing tasks—Formal Language Understanding and Formal Language Generation—and proposes in-context learning with skeleton-based demonstration selection, plus entity linking and chain-of-thought techniques, to assess inherent capabilities without fine-tuning. Across KoPL, SPARQL, and Lambda DCS on datasets like KQA Pro, GrailQA, and Overnight, the study finds LLMs approach human performance in understanding but struggle with accurate generation of logical forms, with KoPL being the most model-friendly. These findings guide the selection of formal languages for LLM-driven KBQA systems and point to data-generation strategies that can bolster parser training, especially in resource-constrained settings.

Abstract

Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. A typical approach to KBQA is semantic parsing, which translates a question into an executable logical form in a formal language. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance. However, although it is validated that LLMs are capable of solving some KBQA problems, there has been little discussion on the differences in LLMs' proficiency in formal languages used in semantic parsing. In this work, we propose to evaluate the understanding and generation ability of LLMs to deal with differently structured logical forms by examining the inter-conversion of natural and formal language through in-context learning of LLMs. Extensive experiments with models of different sizes show that state-of-the-art LLMs can understand formal languages as well as humans, but generating correct logical forms given a few examples remains a challenge. Most importantly, our results also indicate that LLMs exhibit considerable sensitivity. In general, the formal language with a lower formalization level, i.e., the more similar it is to natural language, is more friendly to LLMs.

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

TL;DR

Abstract

Paper Structure (43 sections, 1 equation, 3 figures, 20 tables)

This paper contains 43 sections, 1 equation, 3 figures, 20 tables.

Introduction
Related Work
Evaluation Task Definition
Formal Language Understanding
Formal Language Generation
Formal Language and Datasets
Implementation
Formal Language Understanding
Structure-Preserving Principle
Content-Preserving Principle
Formal Language Generation
Entity Linking
Chain-of-Thought Generation
Experiment Setup
Investigated Models
...and 28 more sections

Figures (3)

Figure 1: A simple illustration for the probing task of both formal language understanding and generation.
Figure 2: An example of a natural language question and its corresponding logical forms in KoPL, SPARQL, and Lambda DCS.
Figure 3: Formal language generation performance of Text-Davinci-003 with various numbers of demonstration examples. The entity linking tag means whether to use entity linking to detect the entities in input and add their 2-hop-related entity and relation names to the input. Note that the difference of maximum demonstration number between formal languages is because the context length of LLM. Each data point takes 3 runs and details in appendix \ref{['sec: flg_detail']}.

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

TL;DR

Abstract

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)