Table of Contents
Fetching ...

Patent-publication pairs for the detection of knowledge transfer from research to industry: reducing ambiguities with word embeddings and references

Klaus Lippert, Konrad U. Förstner

TL;DR

This study set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research, and developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine.

Abstract

The performance of medical research can be viewed and evaluated not only from the perspective of publication output, but also from the perspective of economic exploitability. Patents can represent the exploitation of research results and thus the transfer of knowledge from research to industry. In this study, we set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research. To identify these pairs, we matched scholarly publications and patents by comparing the names of authors and investors. To resolve the ambiguities that arise in this name-matching process, we expanded our approach with two additional filter features, one used to assess the similarity of text content, the other to identify common references in the two document types. To evaluate text similarity, we extracted and transformed technical terms from a medical ontology (MeSH) into numerical vectors using word embeddings. We then calculated the results of the two supporting features over an example five-year period. Furthermore, we developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine. Our complete data processing pipeline is freely available, from the raw data of the two document types right through to the validated publication-patent pairs.

Patent-publication pairs for the detection of knowledge transfer from research to industry: reducing ambiguities with word embeddings and references

TL;DR

This study set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research, and developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine.

Abstract

The performance of medical research can be viewed and evaluated not only from the perspective of publication output, but also from the perspective of economic exploitability. Patents can represent the exploitation of research results and thus the transfer of knowledge from research to industry. In this study, we set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research. To identify these pairs, we matched scholarly publications and patents by comparing the names of authors and investors. To resolve the ambiguities that arise in this name-matching process, we expanded our approach with two additional filter features, one used to assess the similarity of text content, the other to identify common references in the two document types. To evaluate text similarity, we extracted and transformed technical terms from a medical ontology (MeSH) into numerical vectors using word embeddings. We then calculated the results of the two supporting features over an example five-year period. Furthermore, we developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine. Our complete data processing pipeline is freely available, from the raw data of the two document types right through to the validated publication-patent pairs.

Paper Structure

This paper contains 17 sections, 8 figures.

Figures (8)

  • Figure 1: Number of patent families from EPO and publications from PubMed baseline dataset for the period of this study, plus number of each document type involved in the raw publication-patent pairs.
  • Figure 2: Simplified workflow
  • Figure 3: Language of patent descriptions used for MeSH extraction.
  • Figure 4: Cosine similarities of patent-publication pairs separated by the number of common author/inventor names. Cosine similarity is derived from the BERT base model using MeSH main headings from patents and publications. In addition, the numbers of the respective patent-publication pairs are given for the respective numbers of common names.
  • Figure 5: Cosine similarities of patent-publication pairs separated by the number of common references. Cosine similarity is derived from the BERT base model based on MeSH main headings from patents and publications. In addition, the numbers of the respective patent-publication pairs are given for the respective numbers of common references.
  • ...and 3 more figures