Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents
Lapo Santarlasci, Armando Rungi, Antonio Zinilli
TL;DR
The paper develops a text-based NLP framework to identify truly environmental (true-green) patents from a large pool of patents labeled green by traditional classifications. Using a continuous bag-of-words Word2Vec approach trained on about 12.4 million patents, the authors expand a baseline green-technology dictionary and perform regex matching to extract true greens, revealing that only about 18.5% of literature-classified green patents are genuinely environmental, with true greens comprising roughly 2.5–3.5% of total patents over 2010–2022. Incorporating novelty measures (arts2021natural) raises the share of true greens to about 49% of the total under the novelty lens, illustrating substantial under- and over-classification biases in standard schemes. Linking true-green patent ownership to EU firm data through propensity score matching shows that firms with at least one true-green patent exhibit substantially higher sales, market shares, and labor productivity, while profitability effects are mixed and sensitive to robustness checks and firm size; high-novelty true greens tend to amplify sales but may lag in profitability due to R&D costs. The study demonstrates the value of text analysis for more precise policy evaluation and resource allocation, and it highlights the potential for greenwashing if reliance on broad classifications persists.
Abstract
This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20\% of the total of patents classified as green from previous literature. We show heterogeneity by technological classes, and then check that `true' green patents are about 1\% less cited by following inventions. In the second part of the paper, we test the relationship between patenting and a dashboard of firm-level financial accounts in the European Union. After controlling for reverse causality, we show that holding at least one ``true'' green patent raises sales, market shares, and productivity. If we restrict the analysis to high-novelty ``true'' green patents, we find that they also yield higher profits. Our findings underscore the importance of using text analyses to gauge finer-grained patent classifications that are useful for policymaking in different domains.
