Table of Contents
Fetching ...

Lexicalization Is All You Need: Examining the Impact of Lexical Knowledge in a Compositional QALD System

David Maria Schmidt, Mohammad Fazleh Elahi, Philipp Cimiano

TL;DR

The paper tackles the lexical gap in QALD by arguing that explicit lexical knowledge, when used within a compositional framework, substantially improves question answering over linked data. It introduces a dependency-based, DUDES-driven QA pipeline that leverages a Lemon lexical resource to map natural language to SPARQL through bottom-up semantic composition and KB linking. On QALD-9, the approach achieves a micro $F_1$ of $0.72$, surpassing state-of-the-art systems, and demonstrates that large language models have limited capacity to exploit lexical knowledge in a fully compositional manner. The work highlights the value of lexicalization and compositionality for QA over knowledge graphs and suggests hybrid directions that combine symbolic composition with the strengths of LLMs. Practically, it points to the need for scalable lexical resources and efficient disambiguation mechanisms to bridge natural language and structured knowledge bases.

Abstract

In this paper, we examine the impact of lexicalization on Question Answering over Linked Data (QALD). It is well known that one of the key challenges in interpreting natural language questions with respect to SPARQL lies in bridging the lexical gap, that is mapping the words in the query to the correct vocabulary elements. We argue in this paper that lexicalization, that is explicit knowledge about the potential interpretations of a word with respect to the given vocabulary, significantly eases the task and increases the performance of QA systems. Towards this goal, we present a compositional QA system that can leverage explicit lexical knowledge in a compositional manner to infer the meaning of a question in terms of a SPARQL query. We show that such a system, given lexical knowledge, has a performance well beyond current QA systems, achieving up to a $35.8\%$ increase in the micro $F_1$ score compared to the best QA system on QALD-9. This shows the importance and potential of including explicit lexical knowledge. In contrast, we show that LLMs have limited abilities to exploit lexical knowledge, with only marginal improvements compared to a version without lexical knowledge. This shows that LLMs have no ability to compositionally interpret a question on the basis of the meaning of its parts, a key feature of compositional approaches. Taken together, our work shows new avenues for QALD research, emphasizing the importance of lexicalization and compositionality.

Lexicalization Is All You Need: Examining the Impact of Lexical Knowledge in a Compositional QALD System

TL;DR

The paper tackles the lexical gap in QALD by arguing that explicit lexical knowledge, when used within a compositional framework, substantially improves question answering over linked data. It introduces a dependency-based, DUDES-driven QA pipeline that leverages a Lemon lexical resource to map natural language to SPARQL through bottom-up semantic composition and KB linking. On QALD-9, the approach achieves a micro of , surpassing state-of-the-art systems, and demonstrates that large language models have limited capacity to exploit lexical knowledge in a fully compositional manner. The work highlights the value of lexicalization and compositionality for QA over knowledge graphs and suggests hybrid directions that combine symbolic composition with the strengths of LLMs. Practically, it points to the need for scalable lexical resources and efficient disambiguation mechanisms to bridge natural language and structured knowledge bases.

Abstract

In this paper, we examine the impact of lexicalization on Question Answering over Linked Data (QALD). It is well known that one of the key challenges in interpreting natural language questions with respect to SPARQL lies in bridging the lexical gap, that is mapping the words in the query to the correct vocabulary elements. We argue in this paper that lexicalization, that is explicit knowledge about the potential interpretations of a word with respect to the given vocabulary, significantly eases the task and increases the performance of QA systems. Towards this goal, we present a compositional QA system that can leverage explicit lexical knowledge in a compositional manner to infer the meaning of a question in terms of a SPARQL query. We show that such a system, given lexical knowledge, has a performance well beyond current QA systems, achieving up to a increase in the micro score compared to the best QA system on QALD-9. This shows the importance and potential of including explicit lexical knowledge. In contrast, we show that LLMs have limited abilities to exploit lexical knowledge, with only marginal improvements compared to a version without lexical knowledge. This shows that LLMs have no ability to compositionally interpret a question on the basis of the meaning of its parts, a key feature of compositional approaches. Taken together, our work shows new avenues for QALD research, emphasizing the importance of lexicalization and compositionality.

Paper Structure

This paper contains 24 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Schema of the compositional question answering approach using DUDES
  • Figure 2: Illustration of exemplary DUDES and their composition

Theorems & Definitions (2)

  • definition thmcounterdefinition: Dependency-based Underspecified Discourse Representation Structures (DUDES) dudes2
  • definition thmcounterdefinition: DUDES Composition