VerAs: Verify then Assess STEM Lab Reports
Berk Atil, Mahsa Sheikhi Karizaki, Rebecca J. Passonneau
TL;DR
VerAs tackles automated rubric-based assessment of long-form STEM writing by introducing a two-module OpenQA-inspired architecture that first verifies sentence relevance to a rubric dimension and then grades the selected content. Using dual encoders and an ordinal loss, VerAs assigns a score from 0 to 5 for each rubric dimension, demonstrated on college physics lab reports and middle-school essays. The approach outperforms strong baselines on total and per-dimension metrics, with ablations confirming the verifier's value and showing that the method can generalize to different rubric structures. This work enables scalable, formative feedback in STEM education and points to future enhancements with broader domain coverage and integration with large language models.
Abstract
With an increasing focus in STEM education on critical thinking skills, science writing plays an ever more important role in curricula that stress inquiry skills. A recently published dataset of two sets of college level lab reports from an inquiry-based physics curriculum relies on analytic assessment rubrics that utilize multiple dimensions, specifying subject matter knowledge and general components of good explanations. Each analytic dimension is assessed on a 6-point scale, to provide detailed feedback to students that can help them improve their science writing skills. Manual assessment can be slow, and difficult to calibrate for consistency across all students in large classes. While much work exists on automated assessment of open-ended questions in STEM subjects, there has been far less work on long-form writing such as lab reports. We present an end-to-end neural architecture that has separate verifier and assessment modules, inspired by approaches to Open Domain Question Answering (OpenQA). VerAs first verifies whether a report contains any content relevant to a given rubric dimension, and if so, assesses the relevant sentences. On the lab reports, VerAs outperforms multiple baselines based on OpenQA systems or Automated Essay Scoring (AES). VerAs also performs well on an analytic rubric for middle school physics essays.
