Uncovering key predictors of high-growth firms via explainable machine learning
Yiwei Huang, Shuqi Xu, Linyuan Lü, Andrea Zaccaria, Manuel Sebastian Mariani
TL;DR
This study addresses predicting high-growth firms (HGFs) by integrating three feature classes—financial performance, patent-based technological attributes, and network-based measures—within ensemble machine learning models equipped with explainable AI (XAI) tools. Using 5,071 matched firms and 7-year time windows, the authors show that adding technological and/or network features to financial signals improves predictive performance, with the best results when both are combined. Among features, the maximum economic value of a firm’s granted patents and the count of patents tied to primary technologies emerge as powerful predictors, while network-based metrics are generally less predictive but the number of primary-tech patents remains important. PDP and SHAP analyses reveal nonlinear relationships and context-dependent effects, such as firm size exhibiting a plateau in HGF probability and high-value patents benefiting larger firms more, offering practical guidance for investment and resource allocation.
Abstract
Predicting high-growth firms has attracted increasing interest from the technological forecasting and machine learning communities. Most existing studies primarily utilize financial data for these predictions. However, research suggests that a firm's research and development activities and its network position within technological ecosystems may also serve as valuable predictors. To unpack the relative importance of diverse features, this paper analyzes financial and patent data from 5,071 firms, extracting three categories of features: financial features, technological features of granted patents, and network-based features derived from firms' connections to their primary technologies. By utilizing ensemble learning algorithms, we demonstrate that incorporating financial features with either technological, network-based features, or both, leads to more accurate high-growth firm predictions compared to using financial features alone. To delve deeper into the matter, we evaluate the predictive power of each individual feature within their respective categories using explainable artificial intelligence methods. Among non-financial features, the maximum economic value of a firm's granted patents and the number of patents related to a firms' primary technologies stand out for their importance. Furthermore, firm size is positively associated with high-growth probability up to a certain threshold size, after which the association plateaus. Conversely, the maximum economic value of a firm's granted patents is positively linked to high-growth probability only after a threshold value is exceeded. These findings elucidate the complex predictive role of various features in forecasting high-growth firms and could inform technological resource allocation as well as investment decisions.
