Table of Contents
Fetching ...

TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Hans W. A. Hanley, Zakir Durumeric

TL;DR

Stance detection is challenged by topic dependence and the need to generalize to unseen topics. The paper introduces TATA, a two-branch architecture that separately learns topic-aware (TAW) embeddings via a triplet-loss pretraining on a dedicated TAW dataset and topic-agnostic (TAG) embeddings via contrastive learning on an augmented VAST dataset, then fuses them with a specialized attention mechanism for stance prediction. It contributes a 110,000 quadruple TAW dataset and a 743,644-example augmented VAST dataset, achieving state-of-the-art results on VAST in both Zero-shot ($F_1$=$0.771$) and Few-shot ($F_1$=$0.741$), with competitive SEM16t6 performance. These results indicate that disentangling topic-specific and general stance features and integrating them via TATA's attention enhances robustness to unseen topics, offering practical improvements for real-world stance classification.

Abstract

Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771 $F_1$-score on the Zero-shot VAST dataset). We release our code and data at https://github.com/hanshanley/tata.

TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

TL;DR

Stance detection is challenged by topic dependence and the need to generalize to unseen topics. The paper introduces TATA, a two-branch architecture that separately learns topic-aware (TAW) embeddings via a triplet-loss pretraining on a dedicated TAW dataset and topic-agnostic (TAG) embeddings via contrastive learning on an augmented VAST dataset, then fuses them with a specialized attention mechanism for stance prediction. It contributes a 110,000 quadruple TAW dataset and a 743,644-example augmented VAST dataset, achieving state-of-the-art results on VAST in both Zero-shot (=) and Few-shot (=), with competitive SEM16t6 performance. These results indicate that disentangling topic-specific and general stance features and integrating them via TATA's attention enhances robustness to unseen topics, offering practical improvements for real-world stance classification.

Abstract

Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771 -score on the Zero-shot VAST dataset). We release our code and data at https://github.com/hanshanley/tata.
Paper Structure (23 sections, 6 equations, 2 figures, 8 tables)

This paper contains 23 sections, 6 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: TATA Model.
  • Figure 2: As we train the topic-agnostic/TAG layer of our model on our augmented VAST training set, while not separating perfectly (illustrating the need for additional features) clear Pro, Against, and Neutral clusters appear in the t-SNE of the embeddings of the VAST validation dataset. As confirmed elsewhere allaway2020zeroliang2022zero, the Neutral category of examples is the most differentiable from Pro and Against categories.