Table of Contents
Fetching ...

From Division to Unity: A Large-Scale Study on the Emergence of Computational Social Science, 1990-2021

Honglin Bao, Jiawei Zhang, Mingxuan Cao, James A. Evans

TL;DR

This study tackles how a new interdisciplinary field, Computational Social Science (CSS), emerges and reshapes surrounding social sciences. It leverages a large-scale analysis of 11 million social science papers (1990–2021) using an ensemble CSS classifier and SPECTER2-based yearly embeddings, augmented by Word2Vec word-proximity analyses, and provides an interactive live demo of CSS evolution. The work identifies two pivotal inflection points—circa 2005 (early cross-field engagement) and 2014 (diffusion and boundary dissolution)—and shows how data-driven methods homogenize CSS across economics and political science while non-CSS strands diverge in some domains; sociology currently exhibits the strongest CSS engagement. Collectively, the results illuminate diffusion dynamics, identity formation, and the transformative influence of a nascent interdisciplinary field on established disciplines, offering a scalable framework for monitoring field evolution and diffusion in science.

Abstract

We present a comprehensive study on the emergence of Computational Social Science (CSS) - an interdisciplinary field leveraging computational methods to address social science questions - and its impact on adjacent social sciences. We trained a robust CSS classifier using papers from CSS-focused venues and applied it to 11 million papers spanning 1990 to 2021. Our analysis yielded three key findings. First, there were two critical inflections in the rise of CSS. The first occurred around 2005 when psychology, politics, and sociology began engaging with CSS. The second emerged in approximately 2014 when economics finally joined the trend. Sociology is currently the most engaged with CSS. Second, using the density of yearly knowledge embeddings constructed by advanced transformer models, we observed that CSS initially lacked a cohesive identity. From the early 2000s to 2014, however, it began to form a distinct cluster, creating boundaries between CSS and other social sciences, particularly in politics and sociology. After 2014, these boundaries faded, and CSS increasingly blended with the social sciences. Third, shared data-driven methods homogenized CSS papers across disciplines, with politics and economics showing the most alignment due to the combined influence of CSS and causal identification. Nevertheless, non-CSS papers in sociology, psychology, and politics became more divergent. Taken together, these findings highlight the dynamics of division and unity as new disciplines emerge within existing knowledge landscapes. A live demo of CSS evolution can be found in https://evolution-css.netlify.app/

From Division to Unity: A Large-Scale Study on the Emergence of Computational Social Science, 1990-2021

TL;DR

This study tackles how a new interdisciplinary field, Computational Social Science (CSS), emerges and reshapes surrounding social sciences. It leverages a large-scale analysis of 11 million social science papers (1990–2021) using an ensemble CSS classifier and SPECTER2-based yearly embeddings, augmented by Word2Vec word-proximity analyses, and provides an interactive live demo of CSS evolution. The work identifies two pivotal inflection points—circa 2005 (early cross-field engagement) and 2014 (diffusion and boundary dissolution)—and shows how data-driven methods homogenize CSS across economics and political science while non-CSS strands diverge in some domains; sociology currently exhibits the strongest CSS engagement. Collectively, the results illuminate diffusion dynamics, identity formation, and the transformative influence of a nascent interdisciplinary field on established disciplines, offering a scalable framework for monitoring field evolution and diffusion in science.

Abstract

We present a comprehensive study on the emergence of Computational Social Science (CSS) - an interdisciplinary field leveraging computational methods to address social science questions - and its impact on adjacent social sciences. We trained a robust CSS classifier using papers from CSS-focused venues and applied it to 11 million papers spanning 1990 to 2021. Our analysis yielded three key findings. First, there were two critical inflections in the rise of CSS. The first occurred around 2005 when psychology, politics, and sociology began engaging with CSS. The second emerged in approximately 2014 when economics finally joined the trend. Sociology is currently the most engaged with CSS. Second, using the density of yearly knowledge embeddings constructed by advanced transformer models, we observed that CSS initially lacked a cohesive identity. From the early 2000s to 2014, however, it began to form a distinct cluster, creating boundaries between CSS and other social sciences, particularly in politics and sociology. After 2014, these boundaries faded, and CSS increasingly blended with the social sciences. Third, shared data-driven methods homogenized CSS papers across disciplines, with politics and economics showing the most alignment due to the combined influence of CSS and causal identification. Nevertheless, non-CSS papers in sociology, psychology, and politics became more divergent. Taken together, these findings highlight the dynamics of division and unity as new disciplines emerge within existing knowledge landscapes. A live demo of CSS evolution can be found in https://evolution-css.netlify.app/

Paper Structure

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The reactions of social sciences to CSS from 1990-2021.
  • Figure 2: CSS in the embedding space. Panel (a) illustrates the cosine similarity between the central embeddings of CSS papers and non-CSS papers across different years and fields. Panel (b) depicts the dynamics of the normalized density of CSS papers over time.
  • Figure 3: https://evolution-css.netlify.app/ of CSS evolution in 1990 (no clustering), 2014 (an identifiable cluster), and 2021 (cluster faded). Principal component analysis is used to reduce dimensions. Economics: red, politics: green, psychology: pink, sociology: yellow, CSS papers: blue. The range of the x and y axes for the plots of these three years is from -6 to 6. To ensure efficient visualization at scale, we sampled 10% of papers per year for this demo.
  • Figure 4: CSS in a pair of disciplines moves closer than their non-CSS. Y-axis: cosine similarity. X-axis: year.