TSB-HB: A Hierarchical Bayesian Extension of the TSB Model for Intermittent Demand Forecasting
Zong-Han Bai, Po-Yen Chu
TL;DR
TSB-HB addresses intermittent demand forecasting by integrating the intuitive TSB multiplicative structure with hierarchical Bayesian priors. It models demand occurrence via a Beta-Binomial layer and conditional demand sizes on the log scale via a Normal distribution, enabling partial pooling across items through empirical Bayes updates. The approach yields closed-form updates with linear-time per-epoch forecasting and fully probabilistic forecasts, combining interpretability with calibrated uncertainty. Empirical results on Online Retail and M5 show competitive or superior accuracy and stronger calibration for sparse, lumpy series, demonstrating the practical value of hierarchical shrinkage without sacrificing scalability or transparency.
Abstract
Intermittent demand forecasting poses unique challenges due to sparse observations, cold-start items, and obsolescence. Classical models such as Croston, SBA, and the Teunter-Syntetos-Babai (TSB) method provide simple heuristics but lack a principled generative foundation. Deep learning models address these limitations but often require large datasets and sacrifice interpretability. We introduce TSB-HB, a hierarchical Bayesian extension of TSB. Demand occurrence is modeled with a Beta-Binomial distribution, while nonzero demand sizes follow a Log-Normal distribution. Crucially, hierarchical priors enable partial pooling across items, stabilizing estimates for sparse or cold-start series while preserving heterogeneity. This framework yields a fully generative and interpretable model that generalizes classical exponential smoothing. On the UCI Online Retail dataset, TSB-HB achieves lower RMSE and RMSSE than Croston, SBA, TSB, ADIDA, IMAPA, ARIMA and Theta, and on a subset of the M5 dataset it outperforms all classical baselines we evaluate. The model provides calibrated probabilistic forecasts and improved accuracy on intermittent and lumpy items by combining a generative formulation with hierarchical shrinkage, while remaining interpretable and scalable.
