Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Youssouf Emine; Alexandre Forel; Idriss Malek; Thibaut Vidal

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Youssouf Emine, Alexandre Forel, Idriss Malek, Thibaut Vidal

TL;DR

The paper tackles the challenge of deploying large tree ensembles by introducing functionally-identical pruning, which preserves the exact prediction function $H(x; \alpha)$ after pruning. It proposes FIPE, an iterative framework that alternates between a faithful pruner operating on a finite set of points and a separation oracle that certifies faithfulness or uncovers separating inputs, with two pruning variants: an exact $\|w\|_0$ formulation and a scalable $\|w\|_1$ LP relaxation. Key contributions include formalizing the faithfulness constraint for additive tree ensembles, deriving a tractable separation oracle via path-tracking for trees, and demonstrating substantial, lossless compression across AdaBoost, Random Forest, XGBoost, and LightGBM in practice. This approach yields significant memory and inference-time benefits while maintaining or improving predictive performance, offering a robust direction for model compression with faithfulness guarantees.

Abstract

Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a "free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

TL;DR

The paper tackles the challenge of deploying large tree ensembles by introducing functionally-identical pruning, which preserves the exact prediction function

after pruning. It proposes FIPE, an iterative framework that alternates between a faithful pruner operating on a finite set of points and a separation oracle that certifies faithfulness or uncovers separating inputs, with two pruning variants: an exact

formulation and a scalable

LP relaxation. Key contributions include formalizing the faithfulness constraint for additive tree ensembles, deriving a tractable separation oracle via path-tracking for trees, and demonstrating substantial, lossless compression across AdaBoost, Random Forest, XGBoost, and LightGBM in practice. This approach yields significant memory and inference-time benefits while maintaining or improving predictive performance, offering a robust direction for model compression with faithfulness guarantees.

Abstract

Paper Structure (28 sections, 2 theorems, 16 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 28 sections, 2 theorems, 16 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
Problem Statement
Classification ensembles
Functionally-identical pruning
Pruning Algorithms
Pruning on a finite set of points
Minimal-size faithful pruner.
Efficient approximation.
Separation oracle for tree ensembles
Separation for tree ensembles.
Feature consistency.
Computational Experiments
Main results
Analysis of the pruned ensembles
Comparison with "non-faithful" baselines
...and 13 more sections

Key Result

Proposition 2.2

For additive tree ensembles, i.e., a broad class including random forests and boosting methods, Problem opt:fipe is NP-hard.

Figures (3)

Figure 1: A small ensemble made of three trees with equal weights. This ensemble can be pruned without any change in its prediction function by removing the first and third trees.
Figure 2: FIPE iterates between the pruning model and the separation oracle until it returns the set of weights of the pruned model.
Figure 3: Weights of learners in the original and pruned ensembles on the FICO dataset with $M=200$.

Theorems & Definitions (7)

Definition 2.1
Proposition 2.2
Theorem 3.1
Remark 3.2
Remark 3.3
proof : Proof of \ref{['prop:complexity']}
proof : Proof of \ref{['thm:fipe-terminates']}

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

TL;DR

Abstract

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)