Table of Contents
Fetching ...

Become a better you: correlation between the change of research direction and the change of scientific performance

Xiaoyao Yu, Boleslaw K. Szymanski, Tao Jia

TL;DR

This study addresses how changes in a scientist's research direction relate to changes in scientific performance. It introduces a continuous direction-change metric based on PACS-derived topic vectors and analyzes a large APS physics dataset to link direction changes with impact and productivity outcomes. The key finding is a strong positive association between direction change and impact growth (measured via field-normalized citations and relative gain), while direction change does not reliably predict productivity changes after controlling for career length. These results offer insight into the exploration-exploitation dynamics of scientific careers, albeit with survivor bias and data limitations that call for broader validation across datasets and failure cases.

Abstract

It is important to explore how scientists decide their research agenda and the corresponding consequences, as their decisions collectively shape contemporary science. There are studies focusing on the overall performance of individuals with different problem choosing strategies. Here we ask a slightly different but relatively unexplored question: how is a scientist's change of research agenda associated with her change of scientific performance. Using publication records of over 14,000 authors in physics, we quantitatively measure the extent of research direction change and the performance change of individuals. We identify a strong positive correlation between the direction change and impact change. Scientists with a larger direction change not only are more likely to produce works with increased scientific impact compared to their past ones, but also have a higher growth rate of scientific impact. On the other hand, the direction change is not associated with productivity change. Those who stay in familiar topics do not publish faster than those who venture out and establish themselves in a new field. The gauge of research direction in this work is uncorrelated with the diversity of research agenda and the switching probability among topics, capturing the evolution of individual careers from a new point of view. Though the finding is inevitably affected by the survival bias, it sheds light on a range of problems in the career development of individual scientists.

Become a better you: correlation between the change of research direction and the change of scientific performance

TL;DR

This study addresses how changes in a scientist's research direction relate to changes in scientific performance. It introduces a continuous direction-change metric based on PACS-derived topic vectors and analyzes a large APS physics dataset to link direction changes with impact and productivity outcomes. The key finding is a strong positive association between direction change and impact growth (measured via field-normalized citations and relative gain), while direction change does not reliably predict productivity changes after controlling for career length. These results offer insight into the exploration-exploitation dynamics of scientific careers, albeit with survivor bias and data limitations that call for broader validation across datasets and failure cases.

Abstract

It is important to explore how scientists decide their research agenda and the corresponding consequences, as their decisions collectively shape contemporary science. There are studies focusing on the overall performance of individuals with different problem choosing strategies. Here we ask a slightly different but relatively unexplored question: how is a scientist's change of research agenda associated with her change of scientific performance. Using publication records of over 14,000 authors in physics, we quantitatively measure the extent of research direction change and the performance change of individuals. We identify a strong positive correlation between the direction change and impact change. Scientists with a larger direction change not only are more likely to produce works with increased scientific impact compared to their past ones, but also have a higher growth rate of scientific impact. On the other hand, the direction change is not associated with productivity change. Those who stay in familiar topics do not publish faster than those who venture out and establish themselves in a new field. The gauge of research direction in this work is uncorrelated with the diversity of research agenda and the switching probability among topics, capturing the evolution of individual careers from a new point of view. Though the finding is inevitably affected by the survival bias, it sheds light on a range of problems in the career development of individual scientists.

Paper Structure

This paper contains 10 sections, 1 equation, 34 figures, 1 table.

Figures (34)

  • Figure 1: ( a) An example demonstrating the procedure to compose topic tuple and topic vector g. For two topic tuples (66, 68) and (05, 61, 68), the element value in $g$ of topic 66 is calculated as $\frac{1/2+0}{2} = \frac{1}{4}$, as it appears once in one topic tuple and is not included in the other. The element value in $g$ of topic 68 is calculated as $\frac{1/2+1/3}{2} = \frac{5}{12}$, as it appears once in each of the topic tuples. Similarly, the element values in $g$ of topic 05 and 61 are calculated as $\frac{0+1/3}{2} = \frac{1}{6}$. ( b) The scenario that takes the first and the last $m$ papers in a scientist's publication sequence to obtain the direction change $J$, its distribution $P$, the growth fraction and growth rate of the scientific impact and productivity $P_c$, $R_c$, $P_t$, and $R_t$. ( c) The scenario that uses two adjacent sequences of $m$ papers randomly chosen from a scientist's publication sequence. Correspondingly, the quantities obtained are denoted by $\tilde{J}$, $\tilde{P}$, $\tilde{P}_c$, $\tilde{R}_c$, $\tilde{P}_t$, and $\tilde{R}_t$.
  • Figure 2: ( a) $P_c$ conditioning on the range ($J-0.025,J+0.025$] is positively correlated with $J$. The dashed line represents the linear regression. ( b) The average $\tilde{P}_c$ conditioning on the range ($\tilde{J} - 0.025, \tilde{J} + 0.025$] is positively correlated with $\tilde{J}$. ( c) The average $R_c$ conditioning on the range ($J-0.025,J+0.025$] is positively correlated with $J$. ( d) The average $\tilde{R}_c$ conditioning on the range ($\tilde{J} - 0.025, \tilde{J} + 0.025$] is positively correlated with $\tilde{J}$. At the boundary $J=0$ and $\tilde{J}=0$, the range [0, 0.05] is used, and the same boundary condition applies in all the analyses of $J$ and $\tilde{J}$. The scatter plots of $P_c$, $\tilde{P}_c$, $R_c$, and $\tilde{R}_c$ are displayed in Fig. \ref{['figS2']}. The value of $b$ is defined as the slope of the corresponding linear regression function (The dashed line). *** $p < 0.001$, ** $p < 0.05$, * $p < 0.1$ ($t$-test for Pearson coefficient $r$). Error bars represent the one standard deviation of the mean.
  • Figure 3: ( a) $\tilde{P}_t$ is not correlated with $\tilde{J}$ for a range of values ($0 \leq \tilde{J} \leq 0.725$) with over 97% of the sample size and small standard deviations. Due to the relatively small sample size (no more than 100) and high standard deviation for each group of $J$ in the range ($0.725 < \tilde{J} \leq 1.0$), we do not take this range into discussion. ( b) $P_t$ increases with $J$, the slope of which is almost the same as the one predicted by the correlations between $P_t$ and $n$ as well as $n$ and $J$ (Section \ref{['Note.S6']}). ( c) The average output $n$ conditioning on the range of direction change ($J-0.025,J+0.025$] is positively correlated with $J$. ( d) $P_t$ is positively correlated with the output $n$. ( e) After subtracting the increase induced by the pairwise dependence between $n$ and $J$ as well as $n$ and $P_t$, the result indicates that $P^{'}_t$ and $J$ are uncorrelated. The value of $b$ is defined as the slope of the corresponding linear regression function (The dashed line). *** $p < 0.001$, ** $p < 0.05$, * $p < 0.1$ ($t$-test for Pearson coefficient $r$).
  • Figure 4: ( a) For each scientist, we plot her $J$ versus switching probability (grey circle), and the mean value of $J$ conditioning on the range of (switching probability - 0.025, switching probability + 0.025] (scatter with line). The result shows that switching probability is not correlated with $J$ on the individual level ($p>0.1$). ( b) For each scientist, we calculate the average value $\langle \tilde{J} \rangle$ of her $n-1$$\tilde{J}$. Then we plot her $\langle \tilde{J} \rangle$ versus switching probability (grey circle), and the mean value of $\langle \tilde{J} \rangle$ conditioning on the range of (switching probability - 0.025, switching probability + 0.025] (scatter with line). The result shows that switching probability is not correlated with $\langle \tilde{J} \rangle$ at the individual level ($p>0.1$). ( c) For scientists whose impact has increased ($\bar{c}_{2,f} > \bar{c}_{2,i}$), we plot their $J$ versus the change of Shannon entropy $\Delta H$ (grey circle). The result shows that $\Delta H$ and $J$ are not correlated ($p>0.1$). The average $\Delta H$ is close to 0, indicating that diversity change is not associated with the impact increase. ( d) Similar to ( c), but the mean value of $\Delta H$ is taken for individuals with ($J-0.025,J+0.025$]. The value of $b$ is defined as the slope of the corresponding linear regression function (The dashed line). Error bars represent the one standard deviation of the mean.
  • Figure : Figure S1: ( a) The fraction of scientists $P$ within a range of ($J-0.025,J+0.025$] drops exponentially with $J$. ( b) The fraction of scientists $\tilde{P}$ within a range of ($\tilde{J} - 0.025, \tilde{J} + 0.025$] drops exponentially with $\tilde{J}$. ( c) For a scientist with $n$ papers, we calculate the average value $\langle \tilde{J} \rangle$ of her $n-1$$\tilde{J}$. Then we plot her $\langle \tilde{J} \rangle$ versus $J$ (grey circle), and the mean value of $\langle \tilde{J} \rangle$ conditioning on the range of ($J-0.025,J+0.025$] (scatter with line). The result shows that $J$ and $\tilde{J}$ are consistent at the individual level. The value of $b$ is defined as the slope of the corresponding linear regression function (The dashed line). *** $p < 0.001$, ** $p < 0.05$, * $p < 0.1$ ($t$-test for Pearson coefficient $r$). Error bars represent the one standard deviation of the mean.
  • ...and 29 more figures