Table of Contents
Fetching ...

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends

Giuliano Martinelli, Edoardo Barba, Roberto Navigli

TL;DR

Coreference resolution has been dominated by large autoregressive models, which, despite strong performance, are costly and inaccessible under academic budgets. Maverick proposes an encoder-only pipeline with a novel mention extraction, EOS regularization, and efficient pruning, plus three clustering strategies (s2e, mes, incr) to match or exceed state-of-the-art while using far fewer resources. Training uses a multitask loss L_start + L_end + L_clust and exploits DeBERTa-v3 within a single RTX 4090 to achieve state-of-the-art results on OntoNotes (83.6 CoNLL-F1 with Maverick_mes) and robust performance on long-document and out-of-domain data, with substantial memory and speed advantages (up to 0.006x memory and 170x faster inference than Seq2Seq). The open-source release and the efficient incremental variant Maverick_incr further enhance accessibility for downstream tasks and research exploring robust coreference under data-scarce or challenging domains.

Abstract

Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work,we challenge this recent trend by introducing Maverick, a carefully designed - yet simple - pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends

TL;DR

Coreference resolution has been dominated by large autoregressive models, which, despite strong performance, are costly and inaccessible under academic budgets. Maverick proposes an encoder-only pipeline with a novel mention extraction, EOS regularization, and efficient pruning, plus three clustering strategies (s2e, mes, incr) to match or exceed state-of-the-art while using far fewer resources. Training uses a multitask loss L_start + L_end + L_clust and exploits DeBERTa-v3 within a single RTX 4090 to achieve state-of-the-art results on OntoNotes (83.6 CoNLL-F1 with Maverick_mes) and robust performance on long-document and out-of-domain data, with substantial memory and speed advantages (up to 0.006x memory and 170x faster inference than Seq2Seq). The open-source release and the efficient incremental variant Maverick_incr further enhance accessibility for downstream tasks and research exploring robust coreference under data-scarce or challenging domains.

Abstract

Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work,we challenge this recent trend by introducing Maverick, a carefully designed - yet simple - pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.
Paper Structure (38 sections, 15 equations, 9 tables)