Table of Contents
Fetching ...

Extracting Research Instruments from Educational Literature Using LLMs

Jiseung Yoo, Curran Mahowald, Meiyu Li, Wei Ai

TL;DR

The paper addresses the challenge of scalable extraction of educational research instruments from the literature. It proposes an LLM-based information extraction pipeline with a domain-specific schema and multi-step prompts to identify instrument names, types, respondents, constructs, and outcomes in the methods sections. Key contributions include a three-step pipeline (method detection, NER, RE), a dictionary-based standardization of instrument names, and an evaluation showing competitive accuracy and substantial efficiency gains over baseline approaches. The work enables large-scale synthesis and decision-support for researchers, educators, and policymakers by providing structured instrument metadata in an accessible format.

Abstract

Large Language Models (LLMs) are transforming information extraction from academic literature, offering new possibilities for knowledge management. This study presents an LLM-based system designed to extract detailed information about research instruments used in the education field, including their names, types, target respondents, measured constructs, and outcomes. Using multi-step prompting and a domain-specific data schema, it generates structured outputs optimized for educational research. Our evaluation shows that this system significantly outperforms other approaches, particularly in identifying instrument names and detailed information. This demonstrates the potential of LLM-powered information extraction in educational contexts, offering a systematic way to organize research instrument information. The ability to aggregate such information at scale enhances accessibility for researchers and education leaders, facilitating informed decision-making in educational research and policy.

Extracting Research Instruments from Educational Literature Using LLMs

TL;DR

The paper addresses the challenge of scalable extraction of educational research instruments from the literature. It proposes an LLM-based information extraction pipeline with a domain-specific schema and multi-step prompts to identify instrument names, types, respondents, constructs, and outcomes in the methods sections. Key contributions include a three-step pipeline (method detection, NER, RE), a dictionary-based standardization of instrument names, and an evaluation showing competitive accuracy and substantial efficiency gains over baseline approaches. The work enables large-scale synthesis and decision-support for researchers, educators, and policymakers by providing structured instrument metadata in an accessible format.

Abstract

Large Language Models (LLMs) are transforming information extraction from academic literature, offering new possibilities for knowledge management. This study presents an LLM-based system designed to extract detailed information about research instruments used in the education field, including their names, types, target respondents, measured constructs, and outcomes. Using multi-step prompting and a domain-specific data schema, it generates structured outputs optimized for educational research. Our evaluation shows that this system significantly outperforms other approaches, particularly in identifying instrument names and detailed information. This demonstrates the potential of LLM-powered information extraction in educational contexts, offering a systematic way to organize research instrument information. The ability to aggregate such information at scale enhances accessibility for researchers and education leaders, facilitating informed decision-making in educational research and policy.

Paper Structure

This paper contains 9 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the system pipeline.
  • Figure 2: Performance comparison across different prompts and input text types. 'Ex' represents extraction, 'Sum' represents summarization, and 'Dec' represents decision. The highest F1 score (0.665) is achieved using a combination of summarization, extraction, and decision on the method section excerpt.