Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines
Çağrı Çöltekin, Matyáš Kopp, Katja Meden, Vaidas Morkevicius, Nikola Ljubešić, Tomaž Erjavec
TL;DR
The paper presents a reference dataset for multilingual analysis of parliamentary discourse, focusing on political orientation (left-right) and power position (governing vs opposition) derived from ParlaMint speeches. It uses party membership as the labeling proxy and introduces careful sampling to minimize covariate leakage, includes English translations, and provides a simple TF-IDF character n-gram baseline with logistic regression. Key contributions include detailed data construction decisions, cross-parliament statistics, and an initial baseline that informs future shared-task improvements. This resource enables quantitative, cross-country analyses of ideology and power in multilingual contexts and supports baseline comparisons and transfer-learning explorations across languages.
Abstract
We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.
