Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study
Myrthe Reuver, Suzan Verberne, Antske Fokkens
TL;DR
This work tackles the robustness of few-shot cross-topic stance detection by preregistering hypotheses and systematically comparing three modelling axes: stance task definition (Pro/Con vs Same Side Stance), input encoding (bi- versus cross-encoder), and auxiliary task knowledge from Natural Language Inference. Using RoBERTa-based models trained on 100-shot samples from seven datasets, the study finds that Same Side Stance often improves cross-topic robustness but results are dataset-dependent, and that cross-encoding generally outperforms bi-encoding for SSSC while bi-encoding can outperform in some other settings. Adding NLI pre-training yields substantial gains in several configurations but not universally across all datasets, highlighting strong dataset- and method-dependent variability. The authors argue for the necessity of multi-dataset, systematic experimentation to identify robust stance modelling choices, with implications for building viewpoint-diverse news recommender systems that generalize across topics and domains.
Abstract
For a viewpoint-diverse news recommender, identifying whether two news articles express the same viewpoint is essential. One way to determine "same or different" viewpoint is stance detection. In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics. Our experiments test pre-registered hypotheses on stance detection. Specifically, we compare two stance task definitions (Pro/Con versus Same Side Stance), two LLM architectures (bi-encoding versus cross-encoding), and adding Natural Language Inference knowledge, with pre-trained RoBERTa models trained with shots of 100 examples from 7 different stance detection datasets. Some of our hypotheses and claims from earlier work can be confirmed, while others give more inconsistent results. The effect of the Same Side Stance definition on performance differs per dataset and is influenced by other modelling choices. We found no relationship between the number of training topics in the training shots and performance. In general, cross-encoding out-performs bi-encoding, and adding NLI training to our models gives considerable improvement, but these results are not consistent across all datasets. Our results indicate that it is essential to include multiple datasets and systematic modelling experiments when aiming to find robust modelling choices for the concept `stance'.
