Table of Contents
Fetching ...

Graph Guided Question Answer Generation for Procedural Question-Answering

Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

TL;DR

This work tackles task-specific QA on procedural text by introducing graph-guided QA data generation that leverages Abstract Meaning Representation and flow graphs to exhaustively generate in-domain QA pairs from recipes. The approach supports single-instruction and temporal QA, with optional augmentation from large language models to boost language diversity. Experiments show that small models trained on the generated data can match or surpass GPT-3/ChatGPT performance on the target task, especially when semantic coverage is high; LLM augmentation further enhances language quality. The method yields high semantic coverage and question diversity, suggesting a practical pathway to deploy efficient, domain-specific QA systems on devices with limited compute.

Abstract

In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data. While current QA data generation methods can produce well-formed and varied data, their non-exhaustive nature is sub-optimal for training a QA model. In contrast, we leverage the highly structured aspect of procedural text and represent each step and the overall flow of the procedure as graphs. We then condition on graph nodes to automatically generate QA pairs in an exhaustive and controllable manner. Comprehensive evaluations of our method show that: 1) small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT despite being several orders of magnitude smaller. 2) semantic coverage is the key indicator for downstream QA performance. Crucially, while large language models excel at syntactic diversity, this does not necessarily result in improvements on the end QA model. In contrast, the higher semantic coverage provided by our method is critical for QA performance.

Graph Guided Question Answer Generation for Procedural Question-Answering

TL;DR

This work tackles task-specific QA on procedural text by introducing graph-guided QA data generation that leverages Abstract Meaning Representation and flow graphs to exhaustively generate in-domain QA pairs from recipes. The approach supports single-instruction and temporal QA, with optional augmentation from large language models to boost language diversity. Experiments show that small models trained on the generated data can match or surpass GPT-3/ChatGPT performance on the target task, especially when semantic coverage is high; LLM augmentation further enhances language quality. The method yields high semantic coverage and question diversity, suggesting a practical pathway to deploy efficient, domain-specific QA systems on devices with limited compute.

Abstract

In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data. While current QA data generation methods can produce well-formed and varied data, their non-exhaustive nature is sub-optimal for training a QA model. In contrast, we leverage the highly structured aspect of procedural text and represent each step and the overall flow of the procedure as graphs. We then condition on graph nodes to automatically generate QA pairs in an exhaustive and controllable manner. Comprehensive evaluations of our method show that: 1) small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT despite being several orders of magnitude smaller. 2) semantic coverage is the key indicator for downstream QA performance. Crucially, while large language models excel at syntactic diversity, this does not necessarily result in improvements on the end QA model. In contrast, the higher semantic coverage provided by our method is critical for QA performance.
Paper Structure (60 sections, 10 figures, 8 tables, 5 algorithms)

This paper contains 60 sections, 10 figures, 8 tables, 5 algorithms.

Figures (10)

  • Figure 1: AMR example. Linearized AMR graph of the sentence "Cook chicken and other ingredients in the pot over medium heat for 20 minutes to prepare the soup".
  • Figure 2: Flow graph example. The action flow (sub-)graph of the highlighted text section in (a) is shown in (b) where word tokens are grouped together to form complete semantic entities belonging to one of the main categories. The semantic graph is further augmented with implicit entities to represent entities that are omitted from the text.
  • Figure 3: Role-specific QA. Three questions are created by targeting different roles in the input AMR.
  • Figure 4: Example questions generated for the concept of :ARG2 role.
  • Figure 5: Examples of generating "What do we do...?" questions.
  • ...and 5 more figures