Context is Enough: Empirical Validation of $\textit{Sequentiality}$ on Essays
Amal Sunny, Advay Gupta, Vishnu Sreekumar
TL;DR
This work empirically validates a contextual variant of the sequentiality measure for narrative flow, showing that the contextual term aligns more closely with human judgments of Organization and Cohesion in ASAP++ and ELLIPSE than the topic-driven or original formulations. While zero-shot prompting of LLMs yields competitive trait scores, incorporating the contextual sequentiality term as a feature provides additional predictive value when combined with standard linguistic features, outperforming the zero-shot LLM in combined models on at least some datasets. The analysis uses ordinal regression with AIC for model selection, cross-dataset evaluation, and a careful comparison against linguistic baselines, demonstrating the additive utility of explicit sentence-to-sentence flow modeling. These findings support context-based sequentiality as an interpretable, complementary feature for automated essay scoring and related NLP tasks, while recognizing limitations of mid-sized open-source LLMs and the need for further isolation of discourse-specific effects.
Abstract
Recent work has proposed using Large Language Models (LLMs) to quantify narrative flow through a measure called sequentiality, which combines topic and contextual terms. A recent critique argued that the original results were confounded by how topics were selected for the topic-based component, and noted that the metric had not been validated against ground-truth measures of flow. That work proposed using only the contextual term as a more conceptually valid and interpretable alternative. In this paper, we empirically validate that proposal. Using two essay datasets with human-annotated trait scores, ASAP++ and ELLIPSE, we show that the contextual version of sequentiality aligns more closely with human assessments of discourse-level traits such as Organization and Cohesion. While zero-shot prompted LLMs predict trait scores more accurately than the contextual measure alone, the contextual measure adds more predictive value than both the topic-only and original sequentiality formulations when combined with standard linguistic features. Notably, this combination also outperforms the zero-shot LLM predictions, highlighting the value of explicitly modeling sentence-to-sentence flow. Our findings support the use of context-based sequentiality as a validated, interpretable, and complementary feature for automated essay scoring and related NLP tasks.
