SyllabusQA: A Course Logistics Question Answering Dataset
Nigel Fernandez, Alexander Scarlatos, Andrew Lan
TL;DR
SyllabusQA introduces the first public dataset of real course syllabi tailored for logistics-related QA, combining 63 syllabi across 36 majors with 5,078 open-ended QA pairs. The authors propose a comprehensive annotation protocol and novel Fact-QA evaluation to assess factual grounding, highlighting a gap between state-of-the-art LLMs and human accuracy on factual content despite surface-level similarity. They show retrieval-augmented approaches and fine-tuning improve performance, yet long documents and adversarial questions pose challenges for reliable teaching-assistant systems. The work provides a valuable benchmark and release-ready resources to advance open-source, privacy-aware automated teaching assistants in education.
Abstract
Automated teaching assistants and chatbots have significant potential to reduce the workload of human instructors, especially for logistics-related question answering, which is important to students yet repetitive for instructors. However, due to privacy concerns, there is a lack of publicly available datasets. We introduce SyllabusQA, an open-source dataset with 63 real course syllabi covering 36 majors, containing 5,078 open-ended course logistics-related question-answer pairs that are diverse in both question types and answer formats. Since many logistics-related questions contain critical information like the date of an exam, it is important to evaluate the factuality of answers. We benchmark several strong baselines on this task, from large language model prompting to retrieval-augmented generation. We introduce Fact-QA, an LLM-based (GPT-4) evaluation metric to evaluate the factuality of predicted answers. We find that despite performing close to humans on traditional metrics of textual similarity, there remains a significant gap between automated approaches and humans in terms of fact precision.
