Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Rena Gao, Carsten Roever, Jey Han Lau
TL;DR
The paper addresses the lack of ESL dialogue evaluation datasets by introducing SLEDE and a two-level annotation framework that combines four dialogue-level interactivity labels with 17 micro-level linguistic features. It demonstrates that micro-level cues can strongly predict interactivity quality, often outperforming raw-text baselines like BERT, via predictive modeling with LR, RF, and NB (and minimal reliance on deep pretraining). Key contributions include the annotated corpus, a transparent evaluation framework, and insights into high-impact and label-specific micro-features, with implications for ESL assessment and feedback. The work advances fine-grained ESL dialogue evaluation and offers a pathway toward more informative language proficiency assessment tied to interactive communication skills.
Abstract
We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models. Our results demonstrate that certain micro-level features strongly correlate with interactivity quality, like reference word (e.g., she, her, he), revealing new insights about the interaction between higher-level dialogue quality and lower-level linguistic signals. Our framework also provides a means to assess ESL communication, which is useful for language assessment.
