gec-metrics: A Unified Library for Grammatical Error Correction Evaluation
Takumi Goto, Yusuke Sakai, Taro Watanabe
TL;DR
gec-metrics presents a unified, API-first library for grammatical error correction evaluation, addressing fragmentation and reproducibility in existing metrics by consolidating ten metrics and two meta-evaluation frameworks under a common interface. The framework supports CLI, Python API, and visualization tools, facilitating fair comparisons, metric development, and meta-evaluation studies, with emphasis on transparency and reproducibility. Extensive experiments, including LLM-based metrics and metric ensembling, demonstrate how unified interfaces enable robust analysis and reveal context-dependent correlations with human judgments. By providing extensible abstractions, reproducible configurations, and accessible visualization, gec-metrics aims to accelerate reliable GEC evaluation and broad community adoption.
Abstract
We introduce gec-metrics, a library for using and developing grammatical error correction (GEC) evaluation metrics through a unified interface. Our library enables fair system comparisons by ensuring that everyone conducts evaluations using a consistent implementation. Moreover, it is designed with a strong focus on API usage, making it highly extensible. It also includes meta-evaluation functionalities and provides analysis and visualization scripts, contributing to developing GEC evaluation metrics. Our code is released under the MIT license and is also distributed as an installable package. The video is available on YouTube.
