MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base
Hui Li, Xuekang Yang
TL;DR
Comprehensive evaluations on military-speciffc NLP tasks show that MLRIP outperforms existing BERT-based models by substantial margins, establishing new state-of-the-art performance in military entity recognition, typing, and operational linkage extraction tasks while demonstrating superior operational efffciency in resource-constrained environments.
Abstract
Incorporating structured knowledge into pre-trained language models has demonstrated signiffcant bene-ffts for domain-speciffc natural language processing tasks, particularly in specialized ffelds like military intelligence analysis. Existing approaches typically integrate external knowledge through masking tech-niques or fusion mechanisms, but often fail to fully leverage the intrinsic tactical associations and factual information within input sequences, while introducing uncontrolled noise from unveriffed exter-nal sources. To address these limitations, we present MLRIP (Military Language Representation with Integrated Prior), a novel pre-training framework that introduces a hierarchical knowledge integration pipeline combined with a dual-phase entity substitu-tion mechanism. Our approach speciffcally models operational linkages between military entities, capturing critical dependencies such as command, support, and engagement structures. Comprehensive evaluations on military-speciffc NLP tasks show that MLRIP outperforms existing BERT-based models by substantial margins, establishing new state-of-the-art performance in military entity recognition, typing, and operational linkage extraction tasks while demonstrating superior operational efffciency in resource-constrained environments.
