Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

Michal Novák; Miloslav Konopík; Anna Nedoluzhko; Martin Popel; Ondřej Pražák; Jakub Sido; Milan Straka; Zdeněk Žabokrtský; Daniel Zeman

Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

Michal Novák, Miloslav Konopík, Anna Nedoluzhko, Martin Popel, Ondřej Pražák, Jakub Sido, Milan Straka, Zdeněk Žabokrtský, Daniel Zeman

TL;DR

The paper presents the fourth Shared Task on Multilingual Coreference Resolution, introducing an LLM Track and expanding CorefUD-based data to 17 languages. It compares nine systems across two tracks—the LLM-focused approaches and traditional Unconstrained systems—using a plaintext encoding for LLMs and the CorefUD scorer with CoNLL F1 as the primary metric. Key contributions include the integration of CorefUD 1.3 resources, data reductions, plaintext evaluation formats, and a diverse mix of end-to-end and pipeline-based systems, including the CorPipe ensembles. The results show traditional, well-tuned systems (e.g., CorPipe ensembles) still outperform LLM-based approaches on most datasets, though LLMs display potential, with some datasets where they surpass non-LLM systems, indicating directions for future improvements in prompting, data annotation, and representation of coreference in language models.

Abstract

The paper presents an overview of the fourth edition of the Shared Task on Multilingual Coreference Resolution, organized as part of the CODI-CRAC 2025 workshop. As in the previous editions, participants were challenged to develop systems that identify mentions and cluster them according to identity coreference. A key innovation of this year's task was the introduction of a dedicated Large Language Model (LLM) track, featuring a simplified plaintext format designed to be more suitable for LLMs than the original CoNLL-U representation. The task also expanded its coverage with three new datasets in two additional languages, using version 1.3 of CorefUD - a harmonized multilingual collection of 22 datasets in 17 languages. In total, nine systems participated, including four LLM-based approaches (two fine-tuned and two using few-shot adaptation). While traditional systems still kept the lead, LLMs showed clear potential, suggesting they may soon challenge established approaches in future editions.

Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

TL;DR

Abstract

Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)