A Margin-Maximizing Fine-Grained Ensemble Method
Jinghui Yuan, Hao Chen, Renwei Luo, Feiping Nie
TL;DR
The paper tackles the inefficiency of large ensembles in resource-constrained settings and introduces the Margin-Maximizing Fine-Grained Ensemble Method, which learns a learnable confidence matrix $\Theta$ and a margin-based loss $\mathcal{L} = \mathcal{C} - \gamma \mathcal{M}$ with a smooth approximation $\mathcal{M}$ via $\frac{1}{\alpha} \log\left(\sum_{j=1}^c e^{\alpha( \mathcal{S}[ (I \odot \Theta g_i) \mathbf{1} ] - Y_i \odot \mathcal{S}[ (I \odot \Theta g_i) \mathbf{1} ])_j}\right)$. Theoretical results establish convexity of $\mathcal{L}$ w.r.t. $\mathcal{S}[ (I \odot \Theta g_i) \mathbf{1} ]$ (Theorem 1) and Lipschitz continuity with constant $L \le \sqrt{ck} (1+\gamma+\frac{\gamma}{c} e^{\alpha})$ (Theorem 2), enabling efficient gradient-descent optimization. Empirically, the method outperforms traditional random forests with $100$ trees while using only $10$ base learners across multiple datasets, demonstrating improved efficiency and generalization with tighter margins between correct and competing class probabilities. This work offers a practical path to high-performing, resource-efficient ensembles suitable for deployment in constrained environments.
Abstract
Ensemble learning has achieved remarkable success in machine learning, but its reliance on numerous base learners limits its application in resource-constrained environments. This paper introduces an innovative "Margin-Maximizing Fine-Grained Ensemble Method" that achieves performance surpassing large-scale ensembles by meticulously optimizing a small number of learners and enhancing generalization capability. We propose a novel learnable confidence matrix, quantifying each classifier's confidence for each category, precisely capturing category-specific advantages of individual learners. Furthermore, we design a margin-based loss function, constructing a smooth and partially convex objective using the logsumexp technique. This approach improves optimization, eases convergence, and enables adaptive confidence allocation. Finally, we prove that the loss function is Lipschitz continuous, based on which we develop an efficient gradient optimization algorithm that simultaneously maximizes margins and dynamically adjusts learner weights. Extensive experiments demonstrate that our method outperforms traditional random forests using only one-tenth of the base learners and other state-of-the-art ensemble methods.
