Table of Contents
Fetching ...

Uncovering key predictors of high-growth firms via explainable machine learning

Yiwei Huang, Shuqi Xu, Linyuan Lü, Andrea Zaccaria, Manuel Sebastian Mariani

TL;DR

This study addresses predicting high-growth firms (HGFs) by integrating three feature classes—financial performance, patent-based technological attributes, and network-based measures—within ensemble machine learning models equipped with explainable AI (XAI) tools. Using 5,071 matched firms and 7-year time windows, the authors show that adding technological and/or network features to financial signals improves predictive performance, with the best results when both are combined. Among features, the maximum economic value of a firm’s granted patents and the count of patents tied to primary technologies emerge as powerful predictors, while network-based metrics are generally less predictive but the number of primary-tech patents remains important. PDP and SHAP analyses reveal nonlinear relationships and context-dependent effects, such as firm size exhibiting a plateau in HGF probability and high-value patents benefiting larger firms more, offering practical guidance for investment and resource allocation.

Abstract

Predicting high-growth firms has attracted increasing interest from the technological forecasting and machine learning communities. Most existing studies primarily utilize financial data for these predictions. However, research suggests that a firm's research and development activities and its network position within technological ecosystems may also serve as valuable predictors. To unpack the relative importance of diverse features, this paper analyzes financial and patent data from 5,071 firms, extracting three categories of features: financial features, technological features of granted patents, and network-based features derived from firms' connections to their primary technologies. By utilizing ensemble learning algorithms, we demonstrate that incorporating financial features with either technological, network-based features, or both, leads to more accurate high-growth firm predictions compared to using financial features alone. To delve deeper into the matter, we evaluate the predictive power of each individual feature within their respective categories using explainable artificial intelligence methods. Among non-financial features, the maximum economic value of a firm's granted patents and the number of patents related to a firms' primary technologies stand out for their importance. Furthermore, firm size is positively associated with high-growth probability up to a certain threshold size, after which the association plateaus. Conversely, the maximum economic value of a firm's granted patents is positively linked to high-growth probability only after a threshold value is exceeded. These findings elucidate the complex predictive role of various features in forecasting high-growth firms and could inform technological resource allocation as well as investment decisions.

Uncovering key predictors of high-growth firms via explainable machine learning

TL;DR

This study addresses predicting high-growth firms (HGFs) by integrating three feature classes—financial performance, patent-based technological attributes, and network-based measures—within ensemble machine learning models equipped with explainable AI (XAI) tools. Using 5,071 matched firms and 7-year time windows, the authors show that adding technological and/or network features to financial signals improves predictive performance, with the best results when both are combined. Among features, the maximum economic value of a firm’s granted patents and the count of patents tied to primary technologies emerge as powerful predictors, while network-based metrics are generally less predictive but the number of primary-tech patents remains important. PDP and SHAP analyses reveal nonlinear relationships and context-dependent effects, such as firm size exhibiting a plateau in HGF probability and high-value patents benefiting larger firms more, offering practical guidance for investment and resource allocation.

Abstract

Predicting high-growth firms has attracted increasing interest from the technological forecasting and machine learning communities. Most existing studies primarily utilize financial data for these predictions. However, research suggests that a firm's research and development activities and its network position within technological ecosystems may also serve as valuable predictors. To unpack the relative importance of diverse features, this paper analyzes financial and patent data from 5,071 firms, extracting three categories of features: financial features, technological features of granted patents, and network-based features derived from firms' connections to their primary technologies. By utilizing ensemble learning algorithms, we demonstrate that incorporating financial features with either technological, network-based features, or both, leads to more accurate high-growth firm predictions compared to using financial features alone. To delve deeper into the matter, we evaluate the predictive power of each individual feature within their respective categories using explainable artificial intelligence methods. Among non-financial features, the maximum economic value of a firm's granted patents and the number of patents related to a firms' primary technologies stand out for their importance. Furthermore, firm size is positively associated with high-growth probability up to a certain threshold size, after which the association plateaus. Conversely, the maximum economic value of a firm's granted patents is positively linked to high-growth probability only after a threshold value is exceeded. These findings elucidate the complex predictive role of various features in forecasting high-growth firms and could inform technological resource allocation as well as investment decisions.
Paper Structure (38 sections, 18 equations, 9 figures, 6 tables)

This paper contains 38 sections, 18 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Schematic visualization of the process we adopt to collect and organize the patent and financial datasets. More details of the matching process are provided in Appendix \ref{['sec:appendix_matching']}.
  • Figure 2: Overview of our 7-year time window organization of the data.
  • Figure 3: Predictive performance of the naïve classifiers under different growth indicators. Each bar represents an individual feature: red bars for financial features, blue bars for technological features, and green bars for network-based features. The error bars indicate the standard error of the classifiers' average performance, calculated over $100$ different shuffles of the dataset. The inset shows the top $7$ features with the best performance. Financial features have the strongest predictive performance; technological features, such as the maximum economic value of a firm's granted patents, are strongly associated with high economic growth.
  • Figure 4: Rankings of the top $12$ features with the highest Gini importance scores under different growth indicators. Each bar corresponds to a feature. The dashed lines represent the average importance scores across all the features considered. The error bars indicate the standard error of the features' average importance scores, calculated over $100$ iterations of $3$-fold cross-validations. The inset shows the importance rankings of all features. While the growth of the number of employees is associated with financial features, the maximum economic value of a firm's granted patents is a key feature to predict turnover and net income. Among the network-based features, the most important one is the number of patents related to a firm's primary technologies.
  • Figure 5: The SHAP values of the top $12$ features with the highest SHAP-based importance scores. There is a positive correlation between the feature values and SHAP values for most features, which indicates that firms with higher values in these features have a higher probability of achieving high economic growth.
  • ...and 4 more figures