Table of Contents
Fetching ...

Mapping the changing structure of science through diachronic periodical embeddings

Zhuoqi Lyu, Qing Ke

TL;DR

This study introduces diachronic periodical embeddings to quantify how the semantic makeup of scholarly journals evolves across decades. By constructing decade-specific citation trails from a large MAG-derived corpus and training word2vec representations for journals, the authors track semantic shifts ($v_i^t$) and quantify changes via neighbor-based comparisons, revealing a general trend toward specialization with increasing interdisciplinarity in bioscience. They further map journals into a physical-life-health ternary space, identify four clusters, and visualize trajectories to reveal topic evolution and interdisciplinary dynamics, including emerging topics such as AIDS and nanotechnology. The work provides a quantitative framework for the science of science, enabling identification of evolving research fields, interdisciplinary junctions, and the birth of new topics, with robust validation on case journals and accessible data/code for reproducibility.

Abstract

Understanding the changing structure of science over time is essential to elucidating how science evolves. We develop diachronic embeddings of scholarly periodicals to quantify "semantic changes" of periodicals across decades, allowing us to track the evolution of research topics and identify rapidly developing fields. By mapping periodicals within a physical-life-health triangle, we reveal an evolving interdisciplinary science landscape, finding an overall trend toward specialization for most periodicals but increasing interdisciplinarity for bioscience periodicals. Analyzing a periodical's trajectory within this triangle over time allows us to visualize how its research focus shifts. Furthermore, by monitoring the formation of local clusters of periodicals, we can identify emerging research topics such as AIDS research and nanotechnology in the 1980s. Our work offers novel quantification in the science of science and provides a quantitative lens to examine the evolution of science, which may facilitate future investigations into the emergence and development of research fields.

Mapping the changing structure of science through diachronic periodical embeddings

TL;DR

This study introduces diachronic periodical embeddings to quantify how the semantic makeup of scholarly journals evolves across decades. By constructing decade-specific citation trails from a large MAG-derived corpus and training word2vec representations for journals, the authors track semantic shifts () and quantify changes via neighbor-based comparisons, revealing a general trend toward specialization with increasing interdisciplinarity in bioscience. They further map journals into a physical-life-health ternary space, identify four clusters, and visualize trajectories to reveal topic evolution and interdisciplinary dynamics, including emerging topics such as AIDS and nanotechnology. The work provides a quantitative framework for the science of science, enabling identification of evolving research fields, interdisciplinary junctions, and the birth of new topics, with robust validation on case journals and accessible data/code for reproducibility.

Abstract

Understanding the changing structure of science over time is essential to elucidating how science evolves. We develop diachronic embeddings of scholarly periodicals to quantify "semantic changes" of periodicals across decades, allowing us to track the evolution of research topics and identify rapidly developing fields. By mapping periodicals within a physical-life-health triangle, we reveal an evolving interdisciplinary science landscape, finding an overall trend toward specialization for most periodicals but increasing interdisciplinarity for bioscience periodicals. Analyzing a periodical's trajectory within this triangle over time allows us to visualize how its research focus shifts. Furthermore, by monitoring the formation of local clusters of periodicals, we can identify emerging research topics such as AIDS research and nanotechnology in the 1980s. Our work offers novel quantification in the science of science and provides a quantitative lens to examine the evolution of science, which may facilitate future investigations into the emergence and development of research fields.

Paper Structure

This paper contains 17 sections, 1 equation, 29 figures, 9 tables.

Figures (29)

  • Figure 1: Validating diachronic embeddings using Nature. (A) Percentage of papers in Earth and Planetary Sciences and Physics published in Nature by decade. Papers in the 2010s refer to those published in 2010--2021 for simplicity. (B) Relative similarity between Nature and the two focused disciplines. Relative similarity is defined as the average cosine similarity between Nature and all periodicals belonging to that discipline, divided by the average cosine similarity between Nature and all periodicals. (C) The correspondence between publication volume and relative similarity. Color represents discipline and shape marks decade.
  • Figure 2: Quantifying semantic change, $d^{t_1,t_2}$, of a periodical. (A) Two-dimensional visualization of PRSB's semantic change based on its diachronic embeddings. During the 1970s--1990s, it shifted from a cluster of biology periodicals to computer vision to ecology. (B--J) $d^{t_1,t_2}$ for individual periodicals over time. Numbers in parentheses in the titles are total $d^{t_1,t_2}$ over time. Figs. S14--S15 provide more examples.
  • Figure 3: Mapping periodicals within the physical-life-health triangle. (A) A ternary plot showing the distribution of all periodicals with respect to three conceptional axes: physical science, life Science, and health science, in the 2010s. Color denotes research area assigned by Scopus. (B) The same ternary plot but with periodicals colored by cluster labels generated by $k$-means based on periodicals' ternary coordinates. The label of a cluster is the most common Scopus area label. (C) The same ternary plot but with periodicals colored by the level of disagreement between $k$-means clustering and Scopus labels. Periodicals with larger disagreement are colored darker. Highlighted are 8 misclassified periodicals, whose central colors indicate their clustering labels and edge colors represent research areas assigned by Scopus. (D-J) Interpolated heatmaps of disagreement between $k$-means clustering results and Scopus labels for each decade. The interpolation is based on inverse distance weighting (IDW). Numbers in titles are average similarity of all periodicals.
  • Figure 4: Charting evolution traces of periodicals within the physical-life-health triangle. We show trajectories of closeness to the three research areas for 15 periodicals and the averaged trajectories over all periodicals in each category (the last column). Each trajectory is formed by sequentially connecting the positions in the triangle with arrows, from the 1950s (or the decade of establishment) to the 2010s.
  • Figure 5: Detecting emerging research topics. (A) 2-d visualizations of AIDS (marked as stars) and its 10-nearest neighbors (marked as circles) in each decade. The red line marks $d$, the cosine distance from AIDS to its 10th nearest neighbor. (B--D) The most representative words appeared in the titles of top 10% periodicals based on $\Delta d$ for periodicals established in the (B) 1970s, (C) 1980s, and (D) 1990s.
  • ...and 24 more figures