Predicting Company Growth using Scaling Theory informed Machine Learning
Ruyi Tao, Veronica R. Cappelli, Kaiwei Liu, Marcus J. Hamilton, Christopher P. Kempes, Geoffrey B. Wes, Jiang Zhang
TL;DR
This work introduces STIML, a hybrid framework that forecasts company growth by decomposing dynamics into a mechanistic trend based on a generalized scaling growth model and learnable fluctuations captured by data-driven models. The GM predictor for each financial indicator uses power-law scaling with assets, parameterized by $x=c_xA^{\beta_x}$, and combined via an Euler-based solution to obtain $x^{GM}$; STIML then models residuals $\mathbf{Y}-\mathbf{X}^{GM}$ with encoders/decoders such as GM-MLP or GM-iTransformer. Across 31,553 firms (1950–2019) with 16 indicators, STIML achieves higher predictive accuracy than both GM and purely data-driven baselines, with larger gains for big, stable firms and high-volatility regimes, and exhibits interpretability through latent representations and SHAP-based feature attributions. The results suggest that macroeconomic factors provide limited predictive value on average at the firm level, while asymmetries in deviations from scaling laws reveal learnable structure, pointing to directions for refining mechanistic models and incorporating asymmetric fluctuations. Overall, STIML demonstrates regime-dependent predictability in company growth and offers a principled framework to combine mechanistic insight with flexible learning for complex economic time series.
Abstract
Predicting company growth is a critical yet challenging task because observed dynamics blend an underlying structural growth trend with volatile fluctuations. Here, we propose a Scaling-Theory-Informed Machine Learning (STIML) framework that integrates a scaling-based growth model to capture the mechanism-driven average trend, together with a data-driven forecasting model to learn the residual fluctuations. Using Compustat annual financial statement data (1950--2019) for 31,553 North American companies, we extend the growth model beyond assets to multiple financial indicators, and evaluate STIML against growth model-only and purely data-driven baselines. Across 16 target variables, we show that company growth exhibits a clear separation between trend-driven predictability and fluctuation-driven predictability, with their relative importance depending strongly on company size and volatility. Interpretability analyses further show that STIML captures multivariate dependencies beyond simple autocorrelation, and that macroeconomic variables contribute significantly less to predictive performance on average. Moreover, we find the scaling-based growth model overlooks asymmetric deviations, which instead contain the structured and learnable signals, suggesting a path to refine mechanistic models.
