TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings
Hans W. A. Hanley, Zakir Durumeric
TL;DR
Stance detection is challenged by topic dependence and the need to generalize to unseen topics. The paper introduces TATA, a two-branch architecture that separately learns topic-aware (TAW) embeddings via a triplet-loss pretraining on a dedicated TAW dataset and topic-agnostic (TAG) embeddings via contrastive learning on an augmented VAST dataset, then fuses them with a specialized attention mechanism for stance prediction. It contributes a 110,000 quadruple TAW dataset and a 743,644-example augmented VAST dataset, achieving state-of-the-art results on VAST in both Zero-shot ($F_1$=$0.771$) and Few-shot ($F_1$=$0.741$), with competitive SEM16t6 performance. These results indicate that disentangling topic-specific and general stance features and integrating them via TATA's attention enhances robustness to unseen topics, offering practical improvements for real-world stance classification.
Abstract
Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771 $F_1$-score on the Zero-shot VAST dataset). We release our code and data at https://github.com/hanshanley/tata.
