Table of Contents
Fetching ...

GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek

Lefteris Loukas, Nikolaos Smyrnioudis, Chrysa Dikonomaki, Spyros Barbakos, Anastasios Toumazatos, John Koutsikakis, Manolis Kyriakakis, Mary Georgiou, Stavros Vassos, John Pavlopoulos, Ion Androutsopoulos

TL;DR

This paper introduces GR-NLP-TOOLKIT, an open-source NLP toolkit tailored to Modern Greek, tackling five core tasks: POS tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklish-to-Greek transliteration. The approach centers on Transformer-based models (notably Greek-BERT) with task-specific heads, complemented by a byt5-based Greeklish converter, and demonstrates state-of-the-art results against multilingual toolkits on Greek UD benchmarks and Greeklish translation. Key contributions include a pip-installable package, a HuggingFace demo, and a publicly documented Greek NLP API that supports non-Python usage, all aimed at broad accessibility and practical deployment. The toolkit’s open-source nature and demonstration/API infrastructure position it to significantly ease Greek NLP research and applications, with planned enhancements such as toxicity detection and sentiment analysis to broaden its reach.

Abstract

We present GR-NLP-TOOLKIT, an open-source natural language processing (NLP) toolkit developed specifically for modern Greek. The toolkit provides state-of-the-art performance in five core NLP tasks, namely part-of-speech tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklishto-Greek transliteration. The toolkit is based on pre-trained Transformers, it is freely available, and can be easily installed in Python (pip install gr-nlp-toolkit). It is also accessible through a demonstration platform on HuggingFace, along with a publicly available API for non-commercial use. We discuss the functionality provided for each task, the underlying methods, experiments against comparable open-source toolkits, and future possible enhancements. The toolkit is available at: https://github.com/nlpaueb/gr-nlp-toolkit

GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek

TL;DR

This paper introduces GR-NLP-TOOLKIT, an open-source NLP toolkit tailored to Modern Greek, tackling five core tasks: POS tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklish-to-Greek transliteration. The approach centers on Transformer-based models (notably Greek-BERT) with task-specific heads, complemented by a byt5-based Greeklish converter, and demonstrates state-of-the-art results against multilingual toolkits on Greek UD benchmarks and Greeklish translation. Key contributions include a pip-installable package, a HuggingFace demo, and a publicly documented Greek NLP API that supports non-Python usage, all aimed at broad accessibility and practical deployment. The toolkit’s open-source nature and demonstration/API infrastructure position it to significantly ease Greek NLP research and applications, with planned enhancements such as toxicity detection and sentiment analysis to broaden its reach.

Abstract

We present GR-NLP-TOOLKIT, an open-source natural language processing (NLP) toolkit developed specifically for modern Greek. The toolkit provides state-of-the-art performance in five core NLP tasks, namely part-of-speech tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklishto-Greek transliteration. The toolkit is based on pre-trained Transformers, it is freely available, and can be easily installed in Python (pip install gr-nlp-toolkit). It is also accessible through a demonstration platform on HuggingFace, along with a publicly available API for non-commercial use. We discuss the functionality provided for each task, the underlying methods, experiments against comparable open-source toolkits, and future possible enhancements. The toolkit is available at: https://github.com/nlpaueb/gr-nlp-toolkit

Paper Structure

This paper contains 14 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example of a Greek sentence written in Greeklish. There is no consensus mapping. Greek characters may be replaced by Latin-keyboard characters based on visual similarity, phonetic similarity, shared keys etc. Figure from toumazatos-etal-2024-still-all-greeklish-to-me.
  • Figure 2: A dependency tree generated by gr-nlp-toolkit for a Greek sentence whose English translation is "Manchester United was defeated by Atletico Bilbao with a 2:3 score." Figure from Smyrnioudis2021. Tree drawn using spacy's visualizer.
  • Figure 3: Example of gr-nlp-toolkit's demonstration space at https://huggingface.co/spaces/AUEB-NLP/greek-nlp-toolkit-demo. The example shows Greeklish-to-Greek transliteration, but the demo provides access to the other functionalities too (pos and morphological tagging, dependency parsing, ner).