Language Models Learn Metadata: Political Stance Detection Case Study
Stanley Cao, Felix Drinkall
TL;DR
The paper tackles political stance detection in parliamentary debates and analyzes how metadata should be integrated to predict a speaker's vote on a motion. It compares a metadata-focused Naive Bayes baseline, transformer-based fine-tuning, and two hybrid strategies, including prepending metadata as tokens and concatenating metadata-derived probabilities, plus GPT-4o prompts. Findings show that metadata-enhanced approaches outperform prior SOTA on ParlVote+, with prepending party and policy metadata to the input often providing the strongest gains, while the simple party-only Bayes already achieves strong performance; GPT-4o yields moderate gains. The results suggest that metadata can be a highly informative signal and that simpler, metadata-aware designs can surpass more complex architectures, with implications for metadata usage in NLP tasks beyond political discourse.
Abstract
Stance detection is a crucial NLP task with numerous applications in social science, from analyzing online discussions to assessing political campaigns. This paper investigates the optimal way to incorporate metadata into a political stance detection task. We demonstrate that previous methods combining metadata with language-based data for political stance detection have not fully utilized the metadata information; our simple baseline, using only party membership information, surpasses the current state-of-the-art. We then show that prepending metadata (e.g., party and policy) to political speeches performs best, outperforming all baselines, indicating that complex metadata inclusion systems may not learn the task optimally.
