Table of Contents
Fetching ...

Achieving Semantic Consistency: Contextualized Word Representations for Political Text Analysis

Ruiyu Zhang, Lin Nie, Ce Zhao, Qingyang Chen

TL;DR

This paper addresses semantic stability and drift in political text analysis by comparing static Word2Vec with contextual BERT. It uses 2004–2023 People's Daily articles, training yearly Word2Vec models with Skip-gram aligned by Orthogonal Procrustes and applying BERT-base-Chinese to obtain contextual embeddings. It introduces four metrics—$SD$, $MTS$, $RSC$, and $LNS$—to quantify long-term stability and local neighborhood consistency across four time windows (3, 5, 10, 20 years). Results show BERT yields higher $SD$ and $MTS$, lower $RSC$, and higher $LNS$, indicating stable representations and meaningful semantic evolution. The findings support contextual embeddings as a robust foundation for longitudinal political text analysis, while noting computational costs and the potential for hybrid approaches to balance stability with sensitivity.

Abstract

Accurately interpreting words is vital in political science text analysis; some tasks require assuming semantic stability, while others aim to trace semantic shifts. Traditional static embeddings, like Word2Vec effectively capture long-term semantic changes but often lack stability in short-term contexts due to embedding fluctuations caused by unbalanced training data. BERT, which features transformer-based architecture and contextual embeddings, offers greater semantic consistency, making it suitable for analyses in which stability is crucial. This study compares Word2Vec and BERT using 20 years of People's Daily articles to evaluate their performance in semantic representations across different timeframes. The results indicate that BERT outperforms Word2Vec in maintaining semantic stability and still recognizes subtle semantic variations. These findings support BERT's use in text analysis tasks that require stability, where semantic changes are not assumed, offering a more reliable foundation than static alternatives.

Achieving Semantic Consistency: Contextualized Word Representations for Political Text Analysis

TL;DR

This paper addresses semantic stability and drift in political text analysis by comparing static Word2Vec with contextual BERT. It uses 2004–2023 People's Daily articles, training yearly Word2Vec models with Skip-gram aligned by Orthogonal Procrustes and applying BERT-base-Chinese to obtain contextual embeddings. It introduces four metrics—, , , and —to quantify long-term stability and local neighborhood consistency across four time windows (3, 5, 10, 20 years). Results show BERT yields higher and , lower , and higher , indicating stable representations and meaningful semantic evolution. The findings support contextual embeddings as a robust foundation for longitudinal political text analysis, while noting computational costs and the potential for hybrid approaches to balance stability with sensitivity.

Abstract

Accurately interpreting words is vital in political science text analysis; some tasks require assuming semantic stability, while others aim to trace semantic shifts. Traditional static embeddings, like Word2Vec effectively capture long-term semantic changes but often lack stability in short-term contexts due to embedding fluctuations caused by unbalanced training data. BERT, which features transformer-based architecture and contextual embeddings, offers greater semantic consistency, making it suitable for analyses in which stability is crucial. This study compares Word2Vec and BERT using 20 years of People's Daily articles to evaluate their performance in semantic representations across different timeframes. The results indicate that BERT outperforms Word2Vec in maintaining semantic stability and still recognizes subtle semantic variations. These findings support BERT's use in text analysis tasks that require stability, where semantic changes are not assumed, offering a more reliable foundation than static alternatives.

Paper Structure

This paper contains 6 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure :
  • Figure :
  • Figure :