Verifiable Boosted Tree Ensembles

Stefano Calzavara; Lorenzo Cazzaro; Claudio Lucchese; Giulio Ermanno Pibiri

Verifiable Boosted Tree Ensembles

Stefano Calzavara, Lorenzo Cazzaro, Claudio Lucchese, Giulio Ermanno Pibiri

TL;DR

Verifiable Boosted Tree Ensembles investigates robustness verification for boosted tree models, extending verifiable learning beyond simple hard-voting ensembles to gradient-boosted models like LightGBM. The authors show that a restricted large-spread class enables exact polynomial-time verification under $L_\infty$ perturbations, while proving NP-hardness for other norms; they also provide a pseudo-polynomial time approach for general $L_p$ attacks. They implement CARVE-GBM, a verification tool, and augment LightGBM with a large-spread training extension, demonstrating strong empirical accuracy and robust verification on public datasets. Overall, the work delivers scalable verification, practical robustness, and a software path toward security-aware boosting in real-world deployments.

Abstract

Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that specific classes of decision tree ensembles -- called large-spread ensembles -- allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods (i.e., hard majority voting) to advanced boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time when considering attackers based on the $L_\infty$-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the $L_p$-norm for any $p \in \mathbb{N} \cup \{0\}$, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.

Verifiable Boosted Tree Ensembles

TL;DR

perturbations, while proving NP-hardness for other norms; they also provide a pseudo-polynomial time approach for general

attacks. They implement CARVE-GBM, a verification tool, and augment LightGBM with a large-spread training extension, demonstrating strong empirical accuracy and robust verification on public datasets. Overall, the work delivers scalable verification, practical robustness, and a software path toward security-aware boosting in real-world deployments.

Abstract

-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the

-norm for any

, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.

Paper Structure (33 sections, 3 theorems, 11 equations, 3 figures, 4 tables)

This paper contains 33 sections, 3 theorems, 11 equations, 3 figures, 4 tables.

Introduction
Background
Supervised Learning
Boosted Tree Ensembles
Classifier Robustness
Robustness Verification of Tree Ensembles
Robustness Verification of Large-Spread Boosted Ensembles
Optimization Problem
Basic Verification Algorithm
Complexity.
Efficient Verification Algorithm
Complexity.
Solving the Optimization Problem
Solution for $L_\infty$-Attackers
Complexity.
...and 18 more sections

Key Result

Theorem 1

The basic verification algorithm $BV(T,\vec{x},y,p,k)$ returns True if and only if $T$ is robust on the instance $\vec{x}$ with true label $y$ against the attacker $A_{p,k}$.

Figures (3)

Figure 1: Example of regression tree.
Figure 2: Correctness of robustness verification for the efficient algorithm $EV$.
Figure 3: Speedup of the robustness verification time enabled by the use of CARVE-GBM over competitors.

Theorems & Definitions (9)

Definition 1: Robustness
Definition 2: Large-Spread Ensemble CalzavaraCPP23
Definition 3: Adversarial Gain
Theorem 1
proof
Theorem 2
proof
Theorem 3
proof

Verifiable Boosted Tree Ensembles

TL;DR

Abstract

Verifiable Boosted Tree Ensembles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)