Word Embedding for Social Sciences: An Interdisciplinary Survey

Akira Matsui; Emilio Ferrara

Word Embedding for Social Sciences: An Interdisciplinary Survey

Akira Matsui, Emilio Ferrara

TL;DR

The paper tackles the fragmentation of word-embedding literature across social sciences and proposes an integrative survey with a taxonomy centered on word2vec applications. It surveys diverse studies, categorizing them by research topics and nine analysis-method labels, including Pre-trained models, Overfitting, Working Variables, Reference Words, and Non-text data usage, with a mathematical grounding in SGNS learning. A representative simple experiment demonstrates that cosine similarity and Euclidean distance can yield different results, underscoring the importance of metric choice. The work highlights a shift toward non-text data and emphasizes cross-disciplinary communication to improve methodological clarity and transferability. Together, the taxonomy and empirical insights provide practical guidance for social scientists applying word embeddings and for method developers refining alignment and interpretation across domains.

Abstract

To extract essential information from complex data, computer scientists have been developing machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer scientists but also social scientists have benefited and advanced their research because human behavior or social phenomena lies in complex data. However, this emerging trend is not well documented because different social science fields rarely cover each other's work, resulting in fragmented knowledge in the literature. To document this emerging trend, we survey recent studies that apply word embedding techniques to human behavior mining. We built a taxonomy to illustrate the methods and procedures used in the surveyed papers, aiding social science researchers in contextualizing their research within the literature on word embedding applications. This survey also conducts a simple experiment to warn that common similarity measurements used in the literature could yield different results even if they return consistent results at an aggregate level.

Word Embedding for Social Sciences: An Interdisciplinary Survey

TL;DR

Abstract

Word Embedding for Social Sciences: An Interdisciplinary Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (3)