Table of Contents
Fetching ...

Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

Arijit Ghosh Chowdhury, Aman Chadha

TL;DR

The paper addresses QA robustness under natural distribution shifts by proposing a data-centric augmentation pipeline that uses in-the-wild LLMs to generate contexts conditioned on SQuAD questions and to produce corresponding QA pairs. A RoBERTa-Base extractive QA model trained on real SQuAD data benefits from augmented data, with evaluations on natural distribution-shift benchmarks (NewWiki, NYT, Reddit, Amazon) showing improved robustness. Key findings show that mixing real and generated data yields the best balance between robustness and in-domain accuracy, and that generating both contexts and questions is crucial for generalization, whereas context-only or question-only generation has limited or negative effects. The work demonstrates a scalable, practical approach to enhancing domain generalization in QA and informs data augmentation strategies for robust NLP systems; future work includes broader QA-generation comparisons and scaling to larger models.

Abstract

Robustness in Natural Language Processing continues to be a pertinent issue, where state of the art models under-perform under naturally shifted distributions. In the context of Question Answering, work on domain adaptation methods continues to be a growing body of research. However, very little attention has been given to the notion of domain generalization under natural distribution shifts, where the target domain is unknown. With drastic improvements in the quality and access to generative models, we answer the question: How do generated datasets influence the performance of QA models under natural distribution shifts? We perform experiments on 4 different datasets under varying amounts of distribution shift, and analyze how "in-the-wild" generation can help achieve domain generalization. We take a two-step generation approach, generating both contexts and QA pairs to augment existing datasets. Through our experiments, we demonstrate how augmenting reading comprehension datasets with generated data leads to better robustness towards natural distribution shifts.

Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

TL;DR

The paper addresses QA robustness under natural distribution shifts by proposing a data-centric augmentation pipeline that uses in-the-wild LLMs to generate contexts conditioned on SQuAD questions and to produce corresponding QA pairs. A RoBERTa-Base extractive QA model trained on real SQuAD data benefits from augmented data, with evaluations on natural distribution-shift benchmarks (NewWiki, NYT, Reddit, Amazon) showing improved robustness. Key findings show that mixing real and generated data yields the best balance between robustness and in-domain accuracy, and that generating both contexts and questions is crucial for generalization, whereas context-only or question-only generation has limited or negative effects. The work demonstrates a scalable, practical approach to enhancing domain generalization in QA and informs data augmentation strategies for robust NLP systems; future work includes broader QA-generation comparisons and scaling to larger models.

Abstract

Robustness in Natural Language Processing continues to be a pertinent issue, where state of the art models under-perform under naturally shifted distributions. In the context of Question Answering, work on domain adaptation methods continues to be a growing body of research. However, very little attention has been given to the notion of domain generalization under natural distribution shifts, where the target domain is unknown. With drastic improvements in the quality and access to generative models, we answer the question: How do generated datasets influence the performance of QA models under natural distribution shifts? We perform experiments on 4 different datasets under varying amounts of distribution shift, and analyze how "in-the-wild" generation can help achieve domain generalization. We take a two-step generation approach, generating both contexts and QA pairs to augment existing datasets. Through our experiments, we demonstrate how augmenting reading comprehension datasets with generated data leads to better robustness towards natural distribution shifts.
Paper Structure (13 sections, 1 figure, 7 tables)

This paper contains 13 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Overview of the generation system. Our method creates a generated dataset which is then augmented with the real dataset to train a question answering model.