Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams
Enes Bektas, Fazli Can
TL;DR
This paper addresses how to size classifier ensembles in data streams by framing diversity through the linear independence of votes. It develops a probabilistic framework around the dependence probabilities $p_l$, defines $P(n,m)$ as the likelihood of achieving $m$ independent votes from $n$ classifiers, and proves key results (Theorems 1–3) that connect independence to representational capacity and diminishing returns. The authors introduce practical tools, including the Ideal Number of Classifiers (INC) and its simplified form (SINC), plus a closed-form approximation under uniform dependence, and validate them empirically on real and synthetic data with OzaBagging and GOOWE. The findings show a strong PLI–accuracy relationship for robust ensembles, while complex weighting schemes may destabilize at high diversity; the framework offers a principled method for allocating resources in data-stream settings and highlights directions for extending the theory to heterogeneous dependencies.
Abstract
Ensemble learning improves classification performance by combining multiple base classifiers. While increasing the number of classifiers generally enhances accuracy, excessively large ensembles can lead to computational inefficiency and diminishing returns. This paper investigates the relationship between ensemble size and performance through the lens of linear independence among classifier votes in data streams. We propose that ensembles composed of linearly independent classifiers maximize representational capacity, particularly under a geometric model. We then generalize the importance of linear independence to the weighted majority voting problem. By modeling the probability of achieving linear independence among classifier outputs, we derive a theoretical framework that explains the trade-off between ensemble size and accuracy. Our analysis leads to a theoretical estimate of the ensemble size required to achieve a user-specified probability of linear independence. We validate our theory through experiments on both real-world and synthetic datasets using two ensemble methods, OzaBagging and GOOWE. Our results confirm that this theoretical estimate effectively identifies the point of performance saturation for robust ensembles like OzaBagging. Conversely, for complex weighting schemes like GOOWE, our framework reveals that high theoretical diversity can trigger algorithmic instability. Our implementation is publicly available to support reproducibility and future research.
