MaiBaam Annotation Guidelines
Verena Blaschke, Barbara Kovačić, Siyao Peng, Barbara Plank
TL;DR
MaiBaam presents comprehensive annotation guidelines for a Bavarian UD corpus, detailing preprocessing, tokenization, POS tagging, dependencies, and lemmas, while aligning with general UD v2 and German standards. It specifies Bavarian-specific decisions for noun phrases, verbal morphology (notably -ma and auxiliary tua), pronoun inflection, and relative clauses, aiming for consistent, linguistically informed annotations. The document also enumerates broad general rules and discusses future updates motivated by ongoing UD refinements, highlighting the balance between dialectal variation and UD consistency. This work enables robust, comparable Bavarian annotations with potential applicability to closely related German varieties, facilitating downstream NLP tasks and linguistic analysis.
Abstract
This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags and dependencies we use, explain annotation decisions that would also apply to closely related languages like German, and lastly we introduce and motivate decisions that are specific to Bavarian grammar.
