BUFF: Boosted Decision Tree based Ultra-Fast Flow matching
Cheng Jiang, Sitian Qian, Huilin Qu
TL;DR
BUFF tackles the bottleneck of fast, high-dimensional tabular data simulation in high-energy physics by replacing neural normalizing-flow backbones with gradient boosted trees in a conditional flow matching framework (flowBDT). The approach yields orders-of-magnitude speedups in training and inference on CPU while maintaining high fidelity across both high-level observables and low-level calorimeter/jet-constituent data, and it benefits importantly from conditional generation to improve correlation fidelity for unfolding tasks. Evaluations on diverse datasets (JetNet, CaloChallenge, unfolding, and Schrödinger Bridge refinement) demonstrate strong performance in end-to-end fast simulation, high-dimensional low-level generation, and conditional sampling, with robust applicability to tasks like anomaly detection and jet tagging. Overall, BUFF provides a scalable, CPU-friendly surrogate capable of rapid, multi-level collider simulations with promising real-world impact for HL-LHC workflows and beyond.
Abstract
Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning models for tasks specific to tabular data, we adopt the very recent generative modeling class named conditional flow matching and employ different techniques to integrate the usage of Gradient Boosted Trees. The performances are evaluated for various tasks on different analysis level with several public datasets. We demonstrate the training and inference time of most high-level simulation tasks can achieve speedup by orders of magnitude. The application can be extended to low-level feature simulation and conditioned generations with competitive performance.
