ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça, Isabel Trancoso, Alon Lavie
TL;DR
GenResCoh provides a large-scale, multilingual dataset of generated responses with explanations to target turn-level coherence, addressing the need for open-source dialogue evaluation. ECoh, a family of lightweight, LoRA-finetuned evaluators based on Qwen1.5-Chat, achieves strong multilingual coherence detection that can match or exceed GPT-3.5-Turbo while delivering high-quality explanations. The approach demonstrates robust cross-language generalization, including unseen languages and external FED-turn annotations, and highlights practical advantages of open-source evaluation over closed models like GPT-4. Limitations include language coverage, potential generation biases, and dependence on synthetic data, suggesting avenues for expansion and more diverse baselines. Overall, this work offers scalable, accessible tools for multilingual dialogue coherence assessment with strong empirical results and clear paths for future enhancement.
Abstract
Despite being heralded as the new standard for dialogue evaluation, the closed-source nature of GPT-4 poses challenges for the community. Motivated by the need for lightweight, open source, and multilingual dialogue evaluators, this paper introduces GenResCoh (Generated Responses targeting Coherence). GenResCoh is a novel LLM generated dataset comprising over 130k negative and positive responses and accompanying explanations seeded from XDailyDialog and XPersona covering English, French, German, Italian, and Chinese. Leveraging GenResCoh, we propose ECoh (Evaluation of Coherence), a family of evaluators trained to assess response coherence across multiple languages. Experimental results demonstrate that ECoh achieves multilingual detection capabilities superior to the teacher model (GPT-3.5-Turbo) on GenResCoh, despite being based on a much smaller architecture. Furthermore, the explanations provided by ECoh closely align in terms of quality with those generated by the teacher model.
