Table of Contents
Fetching ...

Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents

Lapo Santarlasci, Armando Rungi, Antonio Zinilli

TL;DR

The paper develops a text-based NLP framework to identify truly environmental (true-green) patents from a large pool of patents labeled green by traditional classifications. Using a continuous bag-of-words Word2Vec approach trained on about 12.4 million patents, the authors expand a baseline green-technology dictionary and perform regex matching to extract true greens, revealing that only about 18.5% of literature-classified green patents are genuinely environmental, with true greens comprising roughly 2.5–3.5% of total patents over 2010–2022. Incorporating novelty measures (arts2021natural) raises the share of true greens to about 49% of the total under the novelty lens, illustrating substantial under- and over-classification biases in standard schemes. Linking true-green patent ownership to EU firm data through propensity score matching shows that firms with at least one true-green patent exhibit substantially higher sales, market shares, and labor productivity, while profitability effects are mixed and sensitive to robustness checks and firm size; high-novelty true greens tend to amplify sales but may lag in profitability due to R&D costs. The study demonstrates the value of text analysis for more precise policy evaluation and resource allocation, and it highlights the potential for greenwashing if reliance on broad classifications persists.

Abstract

This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20\% of the total of patents classified as green from previous literature. We show heterogeneity by technological classes, and then check that `true' green patents are about 1\% less cited by following inventions. In the second part of the paper, we test the relationship between patenting and a dashboard of firm-level financial accounts in the European Union. After controlling for reverse causality, we show that holding at least one ``true'' green patent raises sales, market shares, and productivity. If we restrict the analysis to high-novelty ``true'' green patents, we find that they also yield higher profits. Our findings underscore the importance of using text analyses to gauge finer-grained patent classifications that are useful for policymaking in different domains.

Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents

TL;DR

The paper develops a text-based NLP framework to identify truly environmental (true-green) patents from a large pool of patents labeled green by traditional classifications. Using a continuous bag-of-words Word2Vec approach trained on about 12.4 million patents, the authors expand a baseline green-technology dictionary and perform regex matching to extract true greens, revealing that only about 18.5% of literature-classified green patents are genuinely environmental, with true greens comprising roughly 2.5–3.5% of total patents over 2010–2022. Incorporating novelty measures (arts2021natural) raises the share of true greens to about 49% of the total under the novelty lens, illustrating substantial under- and over-classification biases in standard schemes. Linking true-green patent ownership to EU firm data through propensity score matching shows that firms with at least one true-green patent exhibit substantially higher sales, market shares, and labor productivity, while profitability effects are mixed and sensitive to robustness checks and firm size; high-novelty true greens tend to amplify sales but may lag in profitability due to R&D costs. The study demonstrates the value of text analysis for more precise policy evaluation and resource allocation, and it highlights the potential for greenwashing if reliance on broad classifications persists.

Abstract

This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20\% of the total of patents classified as green from previous literature. We show heterogeneity by technological classes, and then check that `true' green patents are about 1\% less cited by following inventions. In the second part of the paper, we test the relationship between patenting and a dashboard of firm-level financial accounts in the European Union. After controlling for reverse causality, we show that holding at least one ``true'' green patent raises sales, market shares, and productivity. If we restrict the analysis to high-novelty ``true'' green patents, we find that they also yield higher profits. Our findings underscore the importance of using text analyses to gauge finer-grained patent classifications that are useful for policymaking in different domains.

Paper Structure

This paper contains 17 sections, 7 equations, 6 figures, 24 tables.

Figures (6)

  • Figure 1: Neural network architecture
  • Figure 2: Density plot of the shares of true green patents by CPC category.
  • Figure 3: Density plot of the RCA index by CPC category.
  • Figure 4: Granted green patents (grey) and true green patents (black) over total by publication year.
  • Figure A1: Grid search for the selection of context window and minimum count hyperparameters.
  • ...and 1 more figures