High-Dimensional Analysis of Bootstrap Ensemble Classifiers
Hamza Cherkaoui, Malik Tiomoko, Mohamed El Amine Seddik, Cosme Louart, Ekkehard Schnoor, Balazs Kegl
TL;DR
This paper tackles high-dimensional bootstrapping for LSSVM classifiers by employing Random Matrix Theory to characterize the ensemble decision score. It derives the asymptotic normal distribution of the aggregated score, along with explicit mean and variance terms, enabling a theoretical classification error and principled hyperparameter guidance for the number of bootstrap subsets $m$ and regularization $\lambda$. The main contributions are the deterministic-equivalent framework, the closed-form expressions for the decision-score moments, and a practical protocol for model selection validated on synthetic and real datasets. The results offer a principled, theory-driven approach to tuning high-dimensional bootstrap ensembles with demonstrated improvements in predictive performance, particularly in settings with strong concentration properties. Limitations are discussed regarding feature types that violate concentration assumptions, motivating future work on non-linear or heterogeneous-data extensions.
Abstract
Bootstrap methods have long been a cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Leveraging tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.
