Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles
Youssouf Emine, Alexandre Forel, Idriss Malek, Thibaut Vidal
TL;DR
The paper tackles the challenge of deploying large tree ensembles by introducing functionally-identical pruning, which preserves the exact prediction function $H(x; \alpha)$ after pruning. It proposes FIPE, an iterative framework that alternates between a faithful pruner operating on a finite set of points and a separation oracle that certifies faithfulness or uncovers separating inputs, with two pruning variants: an exact $\|w\|_0$ formulation and a scalable $\|w\|_1$ LP relaxation. Key contributions include formalizing the faithfulness constraint for additive tree ensembles, deriving a tractable separation oracle via path-tracking for trees, and demonstrating substantial, lossless compression across AdaBoost, Random Forest, XGBoost, and LightGBM in practice. This approach yields significant memory and inference-time benefits while maintaining or improving predictive performance, offering a robust direction for model compression with faithfulness guarantees.
Abstract
Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a "free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.
