Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence
Jiun-Ting Li, Bi-Cheng Yan, Tien-Hong Lo, Yi-Cheng Wang, Yung-Chang Hsu, Berlin Chen
TL;DR
Problem: automated speaking assessment of conversation tests must account for coherence across turns to accurately judge L2 proficiency. Approach: a hierarchical graph modeling framework (EHGM) couples a contextual LM with multi-level graphs that encode semantically related words, intra-response SPO actions, and inter-response discourse, with fusion at the regressor stage to predict $\\hat{Y}$. Contributions: (1) enhanced hierarchical graph modeling of coherence, (2) integration strategy for hierarchical context into holistic scoring, and (3) publicly available code and preprocessing. Findings: on the NICT-JLE benchmark, the proposed method yields substantial improvements over strong baselines in RMSE, PCC, and margin-accuracy, highlighting coherence-aware representations as key for accurate ASAC. Significance: enables more reliable, interpretable automatic assessment of spoken proficiency in conversational settings and informs future coherence-aware language assessment research.
Abstract
Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates. Although prior ASAC approaches have shown promising performance on their respective datasets, there is still a dearth of research specifically focused on incorporating the coherence of the logical flow within a conversation into the grading model. To address this critical challenge, we propose a hierarchical graph model that aptly incorporates both broad inter-response interactions (e.g., discourse relations) and nuanced semantic information (e.g., semantic words and speaker intents), which is subsequently fused with contextual information for the final prediction. Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy with respect to various assessment metrics, as compared to some strong baselines. This also sheds light on the importance of investigating coherence-related facets of spoken responses in ASAC.
