Table of Contents
Fetching ...

Integration of LLM Quality Assurance into an NLG System

Ching-Yi Chen, Johanna Heininger, Adela Schneider, Christian Eckard, Andreas Madsack, Robert Weißgraeber

TL;DR

This work addresses scalable quality assurance for NLG outputs by incorporating a large language model to detect and correct grammar and spelling errors while tracing error sources to the underlying rule-based generation. It implements a human-in-the-loop framework where the LLM suggests edits and a human editor approves or rejects them, with corrections potentially informing future text generation. The evaluation on multilingual basketball reports (English→French, German, Spanish) shows high precision and strong suggestion quality but language-dependent recall and authenticity challenges, underscoring the need for language-specific prompts and cautious reliance on automated revisions. Overall, the paper demonstrates the practicality of integrating LLM-driven QA into NLG pipelines and outlines concrete directions for improving robustness and expanding QA dimensions.

Abstract

In this paper, we present a system that uses a Large Language Model (LLM) to perform grammar and spelling correction as a component of Quality Assurance (QA) for texts generated by NLG systems, which is important for text production in real-world scenarios. Evaluating the results of the system on work-in-progress sports news texts in three languages, we show that it is able to deliver acceptable corrections.

Integration of LLM Quality Assurance into an NLG System

TL;DR

This work addresses scalable quality assurance for NLG outputs by incorporating a large language model to detect and correct grammar and spelling errors while tracing error sources to the underlying rule-based generation. It implements a human-in-the-loop framework where the LLM suggests edits and a human editor approves or rejects them, with corrections potentially informing future text generation. The evaluation on multilingual basketball reports (English→French, German, Spanish) shows high precision and strong suggestion quality but language-dependent recall and authenticity challenges, underscoring the need for language-specific prompts and cautious reliance on automated revisions. Overall, the paper demonstrates the practicality of integrating LLM-driven QA into NLG pipelines and outlines concrete directions for improving robustness and expanding QA dimensions.

Abstract

In this paper, we present a system that uses a Large Language Model (LLM) to perform grammar and spelling correction as a component of Quality Assurance (QA) for texts generated by NLG systems, which is important for text production in real-world scenarios. Evaluating the results of the system on work-in-progress sports news texts in three languages, we show that it is able to deliver acceptable corrections.

Paper Structure

This paper contains 20 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of integrating LLM QA into an NLG system. It demonstrates a human-in-the-loop workflow; the text quality is improved iteratively by repeating steps 2 (LLM suggests corrections) and 3 (human editor checks and carries out corrections)
  • Figure 2: The partial JSON output of our LLM QA system. The id, text, and containers came from our rule sets in the NLG system. The revised text, explanation, and intention came from the LLM QA system. The revised text, explanation, and text fields are used in the LLM-based evaluation.
  • Figure 3: The prompt for the LLM QA system.
  • Figure 4: The prompt for generating ground truth in order to calculate precision and recall.
  • Figure 5: The prompt to rate suggestion quality.
  • ...and 3 more figures