Table of Contents
Fetching ...

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Sai Koneru, Jian Wu, Sarah Rajtmajer

TL;DR

This work explores the ability of current large language models to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts and shares a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences.

Abstract

Hypothesis formulation and testing are central to empirical research. A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Our work explores the ability of current large language models (LLMs) to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences. We compare the performance of LLMs to several state-of-the-art benchmarks and highlight opportunities for future research in this area. The dataset is available at https://github.com/Sai90000/ScientificHypothesisEvidencing.git

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

TL;DR

This work explores the ability of current large language models to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts and shares a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences.

Abstract

Hypothesis formulation and testing are central to empirical research. A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Our work explores the ability of current large language models (LLMs) to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences. We compare the performance of LLMs to several state-of-the-art benchmarks and highlight opportunities for future research in this area. The dataset is available at https://github.com/Sai90000/ScientificHypothesisEvidencing.git
Paper Structure (19 sections, 4 figures, 7 tables)

This paper contains 19 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Exemplar collaborative review document structure for one question.
  • Figure 2: Sentence pair classification based on pre-trained embeddings for concatenated hypothesis-abstract pairs
  • Figure 3: Semantic search-based sample selection for few-shot learning.
  • Figure 4: Average macro-F1 of LLMs with different prompt templates and temperature settings.