MorphBoost: Self-Organizing Universal Gradient Boosting with Adaptive Tree Morphing
Boris Kriuk
TL;DR
MorphBoost tackles the rigidity of static tree structures in gradient boosting by introducing dynamic, self-organizing tree morphing guided by gradient statistics and information-theoretic criteria. The method combines automatic problem fingerprinting, morphing split functions, vectorized batch prediction, and interaction-aware feature importance to adapt to dataset complexity and problem type (binary, multiclass, regression). On 10 diverse datasets, MorphBoost achieves a 0.84% average accuracy gain over XGBoost, with the lowest variance and robust performance across difficulty levels, especially on high-dimensional problems. The results demonstrate practical benefits in both predictive performance and computational efficiency, with potential impact for scalable, adaptive ensembles in heterogeneous data environments.
Abstract
Traditional gradient boosting algorithms employ static tree structures with fixed splitting criteria that remain unchanged throughout training, limiting their ability to adapt to evolving gradient distributions and problem-specific characteristics across different learning stages. This work introduces MorphBoost, a new gradient boosting framework featuring self-organizing tree structures that dynamically morph their splitting behavior during training. The algorithm implements adaptive split functions that evolve based on accumulated gradient statistics and iteration-dependent learning pressures, enabling automatic adjustment to problem complexity. Key innovations include: (1) morphing split criterion combining gradient-based scores with information-theoretic metrics weighted by training progress; (2) automatic problem fingerprinting for intelligent parameter configuration across binary/multiclass/regression tasks; (3) vectorized tree prediction achieving significant computational speedups; (4) interaction-aware feature importance detecting multiplicative relationships; and (5) fast-mode optimization balancing speed and accuracy. Comprehensive benchmarking across 10 diverse datasets against competitive models (XGBoost, LightGBM, GradientBoosting, HistGradientBoosting, ensemble methods) demonstrates that MorphBoost achieves state-of-the-art performance, outperforming XGBoost by 0.84% on average. MorphBoost secured the overall winner position with 4/10 dataset wins (40% win rate) and 6/30 top-3 finishes (20%), while maintaining the lowest variance (σ=0.0948) and highest minimum accuracy across all models, revealing superior consistency and robustness. Performance analysis across difficulty levels shows competitive results on easy datasets while achieving notable improvements on advanced problems due to higher adaptation levels.
