Table of Contents
Fetching ...

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases

Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje F. Karlsson, Tingting Ma, Yuzhong Qu, Chin-Yew Lin

TL;DR

TIARA addresses robustness and generalization gaps in KBQA over large knowledge bases by integrating multi-grained retrieval of entities, exemplary logical forms, and schema items with constrained decoding to generate executable target logical forms. A transformer-based generator (T5) consumes the question and retrieved contexts, producing accurate logical forms while constrained decoding prunes invalid outputs via operator rules and prefix-tree constraints. Empirical results on GrailQA and WebQSP show TIARA achieving state-of-the-art performance across i.i.d., compositional, and zero-shot settings, with notable gains when exemplar forms or schema contexts are used. The work demonstrates that retrieval-augmented generation, coupled with decoding-time constraints, substantially improves grounding, syntax, and overall KBQA reliability on large-scale KBs.

Abstract

Pre-trained language models (PLMs) have shown their effectiveness in multiple scenarios. However, KBQA remains challenging, especially regarding coverage and generalization settings. This is due to two main factors: i) understanding the semantics of both questions and relevant knowledge from the KB; ii) generating executable logical forms with both semantic and syntactic correctness. In this paper, we present a new KBQA model, TIARA, which addresses those issues by applying multi-grained retrieval to help the PLM focus on the most relevant KB contexts, viz., entities, exemplary logical forms, and schema items. Moreover, constrained decoding is used to control the output space and reduce generation errors. Experiments over important benchmarks demonstrate the effectiveness of our approach. TIARA outperforms previous SOTA, including those using PLMs or oracle entity annotations, by at least 4.1 and 1.1 F1 points on GrailQA and WebQuestionsSP, respectively.

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases

TL;DR

TIARA addresses robustness and generalization gaps in KBQA over large knowledge bases by integrating multi-grained retrieval of entities, exemplary logical forms, and schema items with constrained decoding to generate executable target logical forms. A transformer-based generator (T5) consumes the question and retrieved contexts, producing accurate logical forms while constrained decoding prunes invalid outputs via operator rules and prefix-tree constraints. Empirical results on GrailQA and WebQSP show TIARA achieving state-of-the-art performance across i.i.d., compositional, and zero-shot settings, with notable gains when exemplar forms or schema contexts are used. The work demonstrates that retrieval-augmented generation, coupled with decoding-time constraints, substantially improves grounding, syntax, and overall KBQA reliability on large-scale KBs.

Abstract

Pre-trained language models (PLMs) have shown their effectiveness in multiple scenarios. However, KBQA remains challenging, especially regarding coverage and generalization settings. This is due to two main factors: i) understanding the semantics of both questions and relevant knowledge from the KB; ii) generating executable logical forms with both semantic and syntactic correctness. In this paper, we present a new KBQA model, TIARA, which addresses those issues by applying multi-grained retrieval to help the PLM focus on the most relevant KB contexts, viz., entities, exemplary logical forms, and schema items. Moreover, constrained decoding is used to control the output space and reduce generation errors. Experiments over important benchmarks demonstrate the effectiveness of our approach. TIARA outperforms previous SOTA, including those using PLMs or oracle entity annotations, by at least 4.1 and 1.1 F1 points on GrailQA and WebQuestionsSP, respectively.
Paper Structure (40 sections, 3 equations, 4 figures, 9 tables)

This paper contains 40 sections, 3 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of TIARA. 1) Entity retrieval grounds the mention to entity m.0l2l_. 2) Exemplary logical form retrieval enumerates logical forms starting from the entity m.0l2l_ or the number 13.9, and ranks them. 3) Schema retrieval independently grounds the most related schema items. 4) Retrieved multi-grained contexts are then fed to the PLM for generation. 5) Constrained decoding controls the schema search space during logical form generation.
  • Figure 2: Schema retrieval learns if a question and a schema item are a match or not.
  • Figure 3: Given a set of retrieved contexts, T5 generates the target logical form.
  • Figure 4: An example of a trie (prefix tree) that stores KB classes. Each edge represents a token that the PLM can select.