Word Embeddings Track Social Group Changes Across 70 Years in China
Yuxi Ma, Yongqian Peng, Yixin Zhu
TL;DR
This study asks how official Chinese discourse encodes social groups over 1950–2019 and whether these representations differ from Western patterns, especially during radical social transformations. It combines diachronic word embeddings trained on the People's Daily (with annual and decade resolutions) and a secondary Google Books Chinese corpus to quantify group-trait associations via MAC and DiffMAC, supplemented by an event-centric WEAT framework. The authors show persistent asymmetries in valence across gender, ethnicity, age, and body type, with ethnicity and age patterns being relatively stable while gender and economic status undergo dramatic reversals linked to historical events such as the Cultural Revolution and post-1978 reforms. The work provides a non-Western perspective on how state-sanctioned language encodes social structure, offering methodological innovations for temporal linguistic analysis and highlighting the complex interplay between ideology and social change in China.
Abstract
Language encodes societal beliefs about social groups through word patterns. While computational methods like word embeddings enable quantitative analysis of these patterns, studies have primarily examined gradual shifts in Western contexts. We present the first large-scale computational analysis of Chinese state-controlled media (1950-2019) to examine how revolutionary social transformations are reflected in official linguistic representations of social groups. Using diachronic word embeddings at multiple temporal resolutions, we find that Chinese representations differ significantly from Western counterparts, particularly regarding economic status, ethnicity, and gender. These representations show distinct evolutionary dynamics: while stereotypes of ethnicity, age, and body type remain remarkably stable across political upheavals, representations of gender and economic classes undergo dramatic shifts tracking historical transformations. This work advances our understanding of how officially sanctioned discourse encodes social structure through language while highlighting the importance of non-Western perspectives in computational social science.
