Table of Contents
Fetching ...

Zero-Shot Complex Question-Answering on Long Scientific Documents

Wanting Wang

TL;DR

Long scientific documents, especially in social sciences, pose substantial QA challenges due to length, multi-span extraction, and multi-hop reasoning requirements. The authors propose a zero-shot QA pipeline that integrates off-the-shelf pre-trained LMs, Retrieval Augmented Generation, bridge-entity decomposition, and answer ensemble to handle four predefined question types without fine-tuning, validated on the MLpsych dataset of 52 social psychology papers. They introduce novel evaluation metrics—Similar Match and Mentions—to capture semantic and substring-level alignments for long-form answers. Results show robust gains across extractive and long-answer tasks, demonstrating a practical, accessible approach for researchers to extract methodological and analytical information from full papers without heavy ML expertise.

Abstract

With the rapid development in Transformer-based language models, the reading comprehension tasks on short documents and simple questions have been largely addressed. Long documents, specifically the scientific documents that are densely packed with knowledge discovered and developed by humans, remain relatively unexplored. These documents often come with a set of complex and more realistic questions, adding to their complexity. We present a zero-shot pipeline framework that enables social science researchers to perform question-answering tasks that are complex yet of predetermined question formats on full-length research papers without requiring machine learning expertise. Our approach integrates pre-trained language models to handle challenging scenarios including multi-span extraction, multi-hop reasoning, and long-answer generation. Evaluating on MLPsych, a novel dataset of social psychology papers with annotated complex questions, we demonstrate that our framework achieves strong performance through combination of extractive and generative models. This work advances document understanding capabilities for social sciences while providing practical tools for researchers.

Zero-Shot Complex Question-Answering on Long Scientific Documents

TL;DR

Long scientific documents, especially in social sciences, pose substantial QA challenges due to length, multi-span extraction, and multi-hop reasoning requirements. The authors propose a zero-shot QA pipeline that integrates off-the-shelf pre-trained LMs, Retrieval Augmented Generation, bridge-entity decomposition, and answer ensemble to handle four predefined question types without fine-tuning, validated on the MLpsych dataset of 52 social psychology papers. They introduce novel evaluation metrics—Similar Match and Mentions—to capture semantic and substring-level alignments for long-form answers. Results show robust gains across extractive and long-answer tasks, demonstrating a practical, accessible approach for researchers to extract methodological and analytical information from full papers without heavy ML expertise.

Abstract

With the rapid development in Transformer-based language models, the reading comprehension tasks on short documents and simple questions have been largely addressed. Long documents, specifically the scientific documents that are densely packed with knowledge discovered and developed by humans, remain relatively unexplored. These documents often come with a set of complex and more realistic questions, adding to their complexity. We present a zero-shot pipeline framework that enables social science researchers to perform question-answering tasks that are complex yet of predetermined question formats on full-length research papers without requiring machine learning expertise. Our approach integrates pre-trained language models to handle challenging scenarios including multi-span extraction, multi-hop reasoning, and long-answer generation. Evaluating on MLPsych, a novel dataset of social psychology papers with annotated complex questions, we demonstrate that our framework achieves strong performance through combination of extractive and generative models. This work advances document understanding capabilities for social sciences while providing practical tools for researchers.

Paper Structure

This paper contains 25 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Our proposed QA pipeline framework to tackle the 4 complex QA tasks using zero-shot inference with pre-trained LMs. Each question represents a distinct set of challenges (see Table \ref{['tab:eval-descri']} for more details).
  • Figure 2: Example of RAG-Enhanced Entity Extraction
  • Figure 3: Example of Multi-Single-Hop