Table of Contents
Fetching ...

Measuring and Analyzing Subjective Uncertainty in Scientific Communications

Jamshid Sourati, Grace Shao

Abstract

Uncertainty of scientific findings are typically reported through statistical metrics such as $p$-values, confidence intervals, etc. The magnitude of this objective uncertainty is reflected in the language used by the authors to report their findings primarily through expressions carrying uncertainty-inducing terms or phrases. This language uncertainty is a subjective concept and is highly dependent on the writing style of the authors. There is evidence that such subjective uncertainty influences the impact of science on public audience. In this work, we turned our focus to scientists themselves, and measured/analyzed the subjective uncertainty and its impact within scientific communities across different disciplines. We showed that the level of this type of uncertainty varies significantly across different fields, years of publication and geographical locations. We also studied the correlation between subjective uncertainty and several bibliographical metrics, such as number/gender of authors, centrality of the field's community, citation count, etc. The underlying patterns identified in this work are useful in identification and documentation of linguistic norms in scientific communication in different communities/societies.

Measuring and Analyzing Subjective Uncertainty in Scientific Communications

Abstract

Uncertainty of scientific findings are typically reported through statistical metrics such as -values, confidence intervals, etc. The magnitude of this objective uncertainty is reflected in the language used by the authors to report their findings primarily through expressions carrying uncertainty-inducing terms or phrases. This language uncertainty is a subjective concept and is highly dependent on the writing style of the authors. There is evidence that such subjective uncertainty influences the impact of science on public audience. In this work, we turned our focus to scientists themselves, and measured/analyzed the subjective uncertainty and its impact within scientific communities across different disciplines. We showed that the level of this type of uncertainty varies significantly across different fields, years of publication and geographical locations. We also studied the correlation between subjective uncertainty and several bibliographical metrics, such as number/gender of authors, centrality of the field's community, citation count, etc. The underlying patterns identified in this work are useful in identification and documentation of linguistic norms in scientific communication in different communities/societies.

Paper Structure

This paper contains 26 sections, 1 equation, 11 figures, 3 tables.

Figures (11)

  • Figure S1: (a) Three types of metrics used in our correlation analysis: pre-publication metrics describing the coauthorship graph of the community of a paper's subfield based on the most recent 10-year time window preceding the publication time; publication metrics characterizing the author teams as well as the publishing venue; post-publication metrics measuring impact of the paper in internal and external communities. (b, c) Performance of uncertainty measurement approaches we tried: (a) alignment of the methods' output with our annotations based on correlation coefficients. (b) Density estimates of the uncertainty scores of the competing methods on the evaluation dataset.
  • Figure S2: (a) overlapping percentage between pairs of disciplines involved in our study. (b--c) Distribution of estimate language certainty: (b) the distribution of certainty measure for all papers involved in this study, and (c) average and standard deviation of measurements in each field. (d--f) temporal patterns of quantified certainty scores for computational fields (d), life sciences (e) and social sciences(f).
  • Figure S3: Correlation analysis of language certainty with respect to bibliographic metrics: (a--g) temporal (partial) correlations between measured certainty and (a) number of authors (team size), (b) probability that the first author is male, (c) interdisciplinarity of authors team, (d) journal rank, (e) centrality of subfield's network, (f) echo-chamber effect, and (g) citation counts. (h) percentage of decrease in language certainty of papers with at least one twitter mention against those without any mentions (left axis), and logarithm of papers with at least one twitter mention per field (right axis). (i) partial correlations between language certainty and number of twitter mentions after controlling for the rank of publishing journal.
  • Figure S4: (a) Worldwide heatmap of average subjective certainty in physics publications for countries that have been assigned at least 50 papers with certainty measurements. Darker colors represent lower average certainty. This heatmap considered all publications involved in our dataset disregarding the year of publication. (b) Spearman correlations between annual average certainty of publications and the year of publication, shown separately for each country. Correlations whose $p$-values were larger than 0.1 were excluded from the heatmap. (c) Certainty scores of articles averaged across six regional groups.
  • Figure A1: Centrality (C) and echo-chamber effect (E) in different networks. (a) In each of the sample networks, like the one shown in (a), part of the network that is in green shaded region represents the community of the target subfield with the green nodes being part of the community (denoted by $\mathcal{M}$) and non-green nodes being part of a neighboring external community. The blue nodes are the neighbors that we would denote by $\mathcal{N}$ in the equation of echo chamber effect (see \ref{['subapp:echo_chamber']}). (b) both metrics are high; (c) centrality is high but echo-chamber effect is low; (d) centrality is low but echo-chamber effect is high; (e) both metrics are low.
  • ...and 6 more figures