Table of Contents
Fetching ...

Linking Global Science Funding to Research Publications

Jacob Aarup Dalsgaard, Filipi Nascimento Silva, Jin AI

Abstract

Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by systematically disambiguating unique funding acknowledgment strings extracted from publication metadata. Funder names are matched to standardized organizational identifiers using a multi-stage pipeline that combines lexical normalization, similarity-based clustering, rule-based matching, named entity recognition assistance, and manual validation. The resulting dataset links 1.9 million unique funder strings to canonical organization identifiers and records match types and unresolved cases to support transparency. Technical validation includes paper-level comparisons across bibliometric sources and manual verification against full-text acknowledgment sections, with reported recall and precision metrics. This dataset supports analyses of funding flows, institutional funding portfolios, regional representation, and concentration patterns in the global research system.

Linking Global Science Funding to Research Publications

Abstract

Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by systematically disambiguating unique funding acknowledgment strings extracted from publication metadata. Funder names are matched to standardized organizational identifiers using a multi-stage pipeline that combines lexical normalization, similarity-based clustering, rule-based matching, named entity recognition assistance, and manual validation. The resulting dataset links 1.9 million unique funder strings to canonical organization identifiers and records match types and unresolved cases to support transparency. Technical validation includes paper-level comparisons across bibliometric sources and manual verification against full-text acknowledgment sections, with reported recall and precision metrics. This dataset supports analyses of funding flows, institutional funding portfolios, regional representation, and concentration patterns in the global research system.
Paper Structure (15 sections, 5 figures, 5 tables)

This paper contains 15 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Coverage of publications containing funding information across bibliometric databases. Left axis shows the yearly proportion of publications with at least one funding acknowledgement (lineplot). Right axis shows the yearly count of publications with at least one funding acknowledgement (bars)
  • Figure 2: Overview of the funder name disambiguation pipeline. Funding agency name strings extracted from Web of Science acknowledgement metadata are first normalized and clustered using MinHash locality-sensitive hashing to group similar strings. Candidate clusters are then matched to organizations in a reference index derived from OpenAlex and the ROR using a sequence of deterministic rules, including exact-name matching, alternate name matching, substring matching, and acronym matching. High frequency unresolved strings are manually reviewed, and medium-frequency cases are processed using named entity recognition. Remaining unmatched strings are evaluated using a similarity-based fallback procedure before producing the final mapping between funding strings and canonical organization identifiers.
  • Figure 3: Country-level coverage of grant attribution by author affiliation. Maps show the proportion of publications with recorded funding information by country of author affiliation in (a) Dimensions, (b) Web of Science, and (c) OpenAlex.
  • Figure 4: Geographic distribution of funding agencies across bibliometric databases. Maps show the proportion of publications with funding acknowledgements attributed to funders located in each country. Panels compare the geographic distribution of funder-linked publications in (a) Dimensions, (b) Web of Science, and (c) OpenAlex
  • Figure 5: Rank–frequency distribution of publications attributed to funding organizations. Funding agencies are ranked by the number of publications acknowledging support in the Web of Science dataset. The figure shows the cumulative distribution of publications across funders, illustrating the concentration of funding attribution among a relatively small number of organizations. Sector-specific distributions (government, corporate, and philanthropic) are also shown to highlight differences in the concentration of funding activity across organizational types.