ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement

Kangyang Luo; Yuzhuo Bai; Shuzheng Si; Cheng Gao; Zhitong Wang; Yingli Shen; Wenhao Li; Zhu Liu; Yufeng Han; Jiayi Wu; Cunliang Kong; Maosong Sun

ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement

Kangyang Luo, Yuzhuo Bai, Shuzheng Si, Cheng Gao, Zhitong Wang, Yingli Shen, Wenhao Li, Zhu Liu, Yufeng Han, Jiayi Wu, Cunliang Kong, Maosong Sun

TL;DR

ImCoref-CeS tackles coreference resolution by marrying a lightweight, high-performance supervised CR model with the reasoning capabilities of large language models. The framework enhances the supervised backbone (ImCoref) with a Long-Text Encoding Bridging Module, a biaffine end-to-end scoring mechanism, and Hybrid Mention Regularization to improve efficiency and long-range mention handling. An LLM-based Checker-Splitter is dynamically integrated during inference to validate mentions and refine clusters, using structured prompts and filtering to manage cost. Across OntoNotes, LitBank, and WikiCoref, ImCoref-CeS demonstrates superior coreference performance and better generalization, while offering practical trade-offs between accuracy and latency for real-world deployment.

Abstract

Coreference Resolution (CR) is a critical task in Natural Language Processing (NLP). Current research faces a key dilemma: whether to further explore the potential of supervised neural methods based on small language models, whose detect-then-cluster pipeline still delivers top performance, or embrace the powerful capabilities of Large Language Models (LLMs). However, effectively combining their strengths remains underexplored. To this end, we propose \textbf{ImCoref-CeS}, a novel framework that integrates an enhanced supervised model with LLM-based reasoning. First, we present an improved CR method (\textbf{ImCoref}) to push the performance boundaries of the supervised neural method by introducing a lightweight bridging module to enhance long-text encoding capability, devising a biaffine scorer to comprehensively capture positional information, and invoking a hybrid mention regularization to improve training efficiency. Importantly, we employ an LLM acting as a multi-role Checker-Splitter agent to validate candidate mentions (filtering out invalid ones) and coreference results (splitting erroneous clusters) predicted by ImCoref. Extensive experiments demonstrate the effectiveness of ImCoref-CeS, which achieves superior performance compared to existing state-of-the-art (SOTA) methods.

ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement

TL;DR

Abstract

ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)