Table of Contents
Fetching ...

GerPS-Compare: Comparing NER methods for legal norm analysis

Sarah T. Bachinger, Christoph Unger, Robin Erd, Leila Feddoul, Clara Lachenmaier, Sina Zarrieß, Birgitta König-Ries

TL;DR

This work tackles NER for German legal norms by directly comparing rule-based, deep discriminative, and deep generative approaches on the ten-class GerPS-NER corpus. It finds that deep discriminative models consistently yield the strongest macro $F_1$-scores across most classes, while rule-based methods excel only on a notably difficult data field class; deep generative approaches underperform relative to discriminative models in this setting. The results underscore the challenges posed by heterogeneous class definitions and suggest that integrating rule-based and discriminative methods could offer practical gains for digitizing public administration norms. The study advances understanding of NER design choices in specialized legal domains and informs deployment strategies for real-world legal analytics pipelines.

Abstract

We apply NER to a particular sub-genre of legal texts in German: the genre of legal norms regulating administrative processes in public service administration. The analysis of such texts involves identifying stretches of text that instantiate one of ten classes identified by public service administration professionals. We investigate and compare three methods for performing Named Entity Recognition (NER) to detect these classes: a Rule-based system, deep discriminative models, and a deep generative model. Our results show that Deep Discriminative models outperform both the Rule-based system as well as the Deep Generative model, the latter two roughly performing equally well, outperforming each other in different classes. The main cause for this somewhat surprising result is arguably the fact that the classes used in the analysis are semantically and syntactically heterogeneous, in contrast to the classes used in more standard NER tasks. Deep Discriminative models appear to be better equipped for dealing with this heterogenerity than both generic LLMs and human linguists designing rule-based NER systems.

GerPS-Compare: Comparing NER methods for legal norm analysis

TL;DR

This work tackles NER for German legal norms by directly comparing rule-based, deep discriminative, and deep generative approaches on the ten-class GerPS-NER corpus. It finds that deep discriminative models consistently yield the strongest macro -scores across most classes, while rule-based methods excel only on a notably difficult data field class; deep generative approaches underperform relative to discriminative models in this setting. The results underscore the challenges posed by heterogeneous class definitions and suggest that integrating rule-based and discriminative methods could offer practical gains for digitizing public administration norms. The study advances understanding of NER design choices in specialized legal domains and informs deployment strategies for real-world legal analytics pipelines.

Abstract

We apply NER to a particular sub-genre of legal texts in German: the genre of legal norms regulating administrative processes in public service administration. The analysis of such texts involves identifying stretches of text that instantiate one of ten classes identified by public service administration professionals. We investigate and compare three methods for performing Named Entity Recognition (NER) to detect these classes: a Rule-based system, deep discriminative models, and a deep generative model. Our results show that Deep Discriminative models outperform both the Rule-based system as well as the Deep Generative model, the latter two roughly performing equally well, outperforming each other in different classes. The main cause for this somewhat surprising result is arguably the fact that the classes used in the analysis are semantically and syntactically heterogeneous, in contrast to the classes used in more standard NER tasks. Deep Discriminative models appear to be better equipped for dealing with this heterogenerity than both generic LLMs and human linguists designing rule-based NER systems.

Paper Structure

This paper contains 24 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of our workflow for comparing multiple machine learning approaches
  • Figure 2: Evaluation results for the different approaches by class and score type.