LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

Hamed Babaei Giglou; Jennifer D'Souza; Sören Auer

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

Hamed Babaei Giglou, Jennifer D'Souza, Sören Auer

TL;DR

The LLMs4Synthesis framework addresses the need for rapid, coherent, and contextually rich integration of key scientific insights, leveraging both open-source and proprietary LLMs, and establishes nine detailed quality criteria for evaluating syntheses.

Abstract

In response to the growing complexity and volume of scientific literature, this paper introduces the LLMs4Synthesis framework, designed to enhance the capabilities of Large Language Models (LLMs) in generating high-quality scientific syntheses. This framework addresses the need for rapid, coherent, and contextually rich integration of scientific insights, leveraging both open-source and proprietary LLMs. It also examines the effectiveness of LLMs in evaluating the integrity and reliability of these syntheses, alleviating inadequacies in current quantitative metrics. Our study contributes to this field by developing a novel methodology for processing scientific papers, defining new synthesis types, and establishing nine detailed quality criteria for evaluating syntheses. The integration of LLMs with reinforcement learning and AI feedback is proposed to optimize synthesis quality, ensuring alignment with established criteria. The LLMs4Synthesis framework and its components are made available, promising to enhance both the generation and evaluation processes in scientific research synthesis.

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

TL;DR

Abstract

Paper Structure (25 sections, 5 equations, 3 figures, 3 tables)

This paper contains 25 sections, 5 equations, 3 figures, 3 tables.

Introduction
The ORKG Scientific Synthesis Dataset
Synthesis Generation Data Preparation
Scientific Synthesis Generation
Synthesis Quality Evaluation
Nine Criteria of Synthesis Quality
LLM Evaluation of Synthesis Quality
Results.
Human Evaluation of Synthesis Quality
Survey setup
Survey participant characteristics.
Survey Results.
The LLMs4Synthesis Framework
Supervised Finetuning (SFT)
Modeling Feedback
...and 10 more sections

Figures (3)

Figure 1: Evaluation results from the GPT-4 LLM evaluator (purple and green bars) and a Prolific human survey (red and blue bars) for syntheses generated by Mistral and GPT-4. The data includes averaged scores across three synthesis types and five domains—Chemistry, Computer Science, Earth Science, Linguistics, and Sociology.
Figure 2: LLMs4Synthesis Framework using Supervised Fine-Tuning and Reinforcement Learning lambert2022illustrating. Note: SFT is optional, but we achieved better performance when it was included.
Figure 3: Consistency comparison of the GPT-4 evaluator between the Vanilla and SFT+RLAIF (w/ GPT-4 Features) models, assessed through three evaluations on the test set.

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

TL;DR

Abstract

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (3)