CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang
TL;DR
CMNEE introduces a large-scale, document-level dataset for open-source Chinese military news event extraction, addressing data scarcity in this domain. It comprises 17,000 documents and 29,223 manually annotated events across 8 event types and 11 argument roles, with a two-stage, multi-turn annotation workflow and explicit quality controls. A comprehensive benchmark across several model families shows that military-domain event extraction remains challenging, with trigger information and co-reference arguments significantly impacting performance. The dataset, along with code and data, provides a valuable resource to push forward research on multi-event, document-level extraction in specialized domains. CMNEE thus serves as a foundational resource to advance practical military information extraction and related decision-support applications.
Abstract
Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.
