Strategies for political-statement segmentation and labelling in unstructured text
Dmitry Nikolaev, Sean Papay
TL;DR
The paper tackles the problem of jointly segmenting raw political text and labeling statements with MARPOR categories to enable scalable, cross-domain analysis. It compares three unified models—CRF with a multilingual encoder, fine-tuned Flan-T5, and in-context learning with Llama 3.1—on manifestos and parliamentary debates, highlighting trade-offs between accuracy and compute. Key findings show that the CRF approach offers efficient, competitive performance when statement boundaries are not provided, while fine-tuned transformers yield higher in-domain accuracy at substantial compute cost; constrained decoding in the in-context setting is less effective currently. The authors demonstrate the method’s applicability by applying it to UK Hansard debates, revealing interpretable party trajectories, and emphasize the need for faster inference to scale to large corpora.
Abstract
Analysis of parliamentary speeches and political-party manifestos has become an integral area of computational study of political texts. While speeches have been overwhelmingly analysed using unsupervised methods, a large corpus of manifestos with by-statement political-stance labels has been created by the participants of the MARPOR project. It has been recently shown that these labels can be predicted by a neural model; however, the current approach relies on provided statement boundaries, limiting out-of-domain applicability. In this work, we propose and test a range of unified split-and-label frameworks -- based on linear-chain CRFs, fine-tuned text-to-text models, and the combination of in-context learning with constrained decoding -- that can be used to jointly segment and classify statements from raw textual data. We show that our approaches achieve competitive accuracy when applied to raw text of political manifestos, and then demonstrate the research potential of our method by applying it to the records of the UK House of Commons and tracing the political trajectories of four major parties in the last three decades.
