Table of Contents
Fetching ...

High-Dimensional Analysis of Bootstrap Ensemble Classifiers

Hamza Cherkaoui, Malik Tiomoko, Mohamed El Amine Seddik, Cosme Louart, Ekkehard Schnoor, Balazs Kegl

TL;DR

This paper tackles high-dimensional bootstrapping for LSSVM classifiers by employing Random Matrix Theory to characterize the ensemble decision score. It derives the asymptotic normal distribution of the aggregated score, along with explicit mean and variance terms, enabling a theoretical classification error and principled hyperparameter guidance for the number of bootstrap subsets $m$ and regularization $\lambda$. The main contributions are the deterministic-equivalent framework, the closed-form expressions for the decision-score moments, and a practical protocol for model selection validated on synthetic and real datasets. The results offer a principled, theory-driven approach to tuning high-dimensional bootstrap ensembles with demonstrated improvements in predictive performance, particularly in settings with strong concentration properties. Limitations are discussed regarding feature types that violate concentration assumptions, motivating future work on non-linear or heterogeneous-data extensions.

Abstract

Bootstrap methods have long been a cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Leveraging tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.

High-Dimensional Analysis of Bootstrap Ensemble Classifiers

TL;DR

This paper tackles high-dimensional bootstrapping for LSSVM classifiers by employing Random Matrix Theory to characterize the ensemble decision score. It derives the asymptotic normal distribution of the aggregated score, along with explicit mean and variance terms, enabling a theoretical classification error and principled hyperparameter guidance for the number of bootstrap subsets and regularization . The main contributions are the deterministic-equivalent framework, the closed-form expressions for the decision-score moments, and a practical protocol for model selection validated on synthetic and real datasets. The results offer a principled, theory-driven approach to tuning high-dimensional bootstrap ensembles with demonstrated improvements in predictive performance, particularly in settings with strong concentration properties. Limitations are discussed regarding feature types that violate concentration assumptions, motivating future work on non-linear or heterogeneous-data extensions.

Abstract

Bootstrap methods have long been a cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Leveraging tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.

Paper Structure

This paper contains 38 sections, 1 theorem, 65 equations, 5 figures.

Key Result

Theorem 4.1

Under Assumptions ass:concentrated_random_vector and ass:asymptotics, the decision score $g(\mathbf{x})$ converges in distribution to a normal random variable: where $\mathfrak{m}_\ell$ and $\sigma_\ell$ are the mean and variance of the decision score, respectively. These are given by the following formulas: The variance $\sigma_\ell$ incorporates dependencies on the covariance structure and the

Figures (5)

  • Figure 1: Comparative scheme between classical grid-search and our theoretical approach for hyperparameter optimization in LSSVM. The heatmaps illustrate the impact of hyperparameter selection on the model's decision boundary.
  • Figure 2: (top-left) Empirical classification error (bottom-left) and theoretical classification error (bottom-right) as a function of the number of bootstrap ensemble models $m$ and the $\ell_2$-norm regularization parameter $\lambda$. Darker blue indicates lower error, while lighter yellow indicates higher error. (top-right) Error as a function of $\lambda$ at the white dashed vertical line in the left panel. (bottom-right) Error as a function of $m$ at the white dashed horizontal line in the left panel. The blue line represents the empirical averaged error with transparent standard deviation shading, while the orange line shows the theoretical error.
  • Figure 3: Classification error as a function of the number of bootstrap ensemble models $m$ for $\lambda = 0.01$ (top) and $\lambda = 0.1$ (bottom) with the identity (left) and Toeplitz covariance matrix (right). The orange line represents the empirical error with transparent standard deviation shading, while the blue line represents the theoretical error.
  • Figure 4: Percentage improvement of our theoretical approach over grid-search (blue) and random-search (orange) in terms of classification error. The improvement is calculated as $(\tilde{\varepsilon} - \varepsilon) / \varepsilon$ for each dataset.
  • Figure 5: Classification error as a function of the number of bootstrap ensemble models $m$. The orange line shows the empirical averaged error with its standard deviation in transparent; the blue line represents the theoretical error predicted by our approach.

Theorems & Definitions (6)

  • Definition 3.1: $q$-exponential concentration; observable diameter
  • Theorem 4.1: Asymptotic Distribution of the Decision Score
  • Remark 4.1: Classification Error
  • Definition B.1: $q$-exponential concentration; observable diameter
  • Definition B.2: Deterministic equivalents
  • Remark B.1: Classification Accuracy