Rendering Transparency to Ranking in Educational Assessment via Bayesian Comparative Judgement
Andy Gray, Alma Rahat, Stephen Lindsay, Jen Pearson, Tom Crick
TL;DR
The paper addresses the lack of transparency in educational assessment by leveraging Bayesian Comparative Judgement (BCJ) and its multi-criteria extension (MBCJ) to produce probabilistic, interpretable rankings. By integrating prior information and providing rank posteriors with uncertainty measures, BCJ improves auditability and accountability, while MBCJ decomposes judgments by multiple learning outcomes for granular insights. The authors demonstrate these approaches on a real UK higher-education dataset, supplemented by questionnaires, workshops, and expert interviews to examine perceived transparency, reliability, and practicality. They find that BCJ enhances consistency and interpretability relative to traditional marking, with MBCJ offering even stronger transparency at the LO level, though feedback mechanisms remain a crucial area for development and broader adoption in high-stakes contexts.
Abstract
Ensuring transparency in educational assessment is increasingly critical, particularly post-pandemic, as demand grows for fairer and more reliable evaluation methods. Comparative Judgement (CJ) offers a promising alternative to traditional assessments, yet concerns remain about its perceived opacity. This paper examines how Bayesian Comparative Judgement (BCJ) enhances transparency by integrating prior information into the judgement process, providing a structured, data-driven approach that improves interpretability and accountability. BCJ assigns probabilities to judgement outcomes, offering quantifiable measures of uncertainty and deeper insights into decision confidence. By systematically tracking how prior data and successive judgements inform final rankings, BCJ clarifies the assessment process and helps identify assessor disagreements. Multi-criteria BCJ extends this by evaluating multiple learning outcomes (LOs) independently, preserving the richness of CJ while producing transparent, granular rankings aligned with specific assessment goals. It also enables a holistic ranking derived from individual LOs, ensuring comprehensive evaluations without compromising detailed feedback. Using a real higher education dataset with professional markers in the UK, we demonstrate BCJ's quantitative rigour and ability to clarify ranking rationales. Through qualitative analysis and discussions with experienced CJ practitioners, we explore its effectiveness in contexts where transparency is crucial, such as high-stakes national assessments. We highlight the benefits and limitations of BCJ, offering insights into its real-world application across various educational settings.
