Polarization Detection on Social Networks: dual contrastive objectives for Self-supervision
Hang Cui, Tarek Abdelzaher
TL;DR
This paper addresses polarization detection on social networks by introducing DocTra, a unified self-supervised framework with dual contrastive objectives: an interaction-level objective that contrasts positive and negatives interactions (including polarization-induced silence) and a feature-level objective that decouples polarized from invariant features. It provides an efficient solver, supports semi-supervised and prompt-tuning supervision, and proposes a unified polarization index to quantify polarization while normalizing background engagement and mitigating outliers. Empirical results on seven public datasets show significant improvements over eight baselines, demonstrating robustness to varying edge types, signs, and noise. The work advances polarization analysis by delivering a generalizable, self-supervised approach with practical utility for clustering, classification, and cross-dataset comparison.
Abstract
Echo chambers and online discourses have become prevalent social phenomena where communities engage in dramatic intra-group confirmations and inter-group hostility. Polarization detection is a rising research topic for detecting and identifying such polarized groups. Previous works on polarization detection primarily focus on hand-crafted features derived from dataset-specific characteristics and prior knowledge, which fail to generalize to other datasets. This paper proposes a unified self-supervised polarization detection framework, outperforming previous methods in unsupervised and semi-supervised polarization detection tasks on various publicly available datasets. Our framework utilizes a dual contrastive objective (DocTra): (1) interaction-level: to contrast between node interactions to extract critical features on interaction patterns, and (2) feature-level: to contrast extracted polarized and invariant features to encourage feature decoupling. Our experiments extensively evaluate our methods again 7 baselines on 7 public datasets, demonstrating significant performance improvements.
