Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification
Mohammad Zubair Khan, David Li
TL;DR
This paper tackles binary classification in domains where interpretability is essential and data may contain latent internal groupings. It introduces dynamic logistic ensembles that automatically partition data and compute recursive probabilities across multiple layers, preserving interpretability while enhancing predictive power. Key contributions include a novel recursive probability framework, analytic gradient derivations for n-layer ensembles via maximum likelihood, a data augmentation strategy to simulate internal structures, and an open-source Python implementation. The approach demonstrates improved performance over a baseline logistic regression on a wine-quality dataset, with 2–3 layer ensembles offering the best balance between accuracy and generalization, and provides a foundation for scalable, interpretable ensemble methods in practical applications.
Abstract
This paper presents a novel approach to binary classification using dynamic logistic ensemble models. The proposed method addresses the challenges posed by datasets containing inherent internal clusters that lack explicit feature-based separations. By extending traditional logistic regression, we develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. A key innovation in this work is the recursive probability calculation, derived through algebraic manipulation and mathematical induction, which enables scalable and efficient model construction. Compared to traditional ensemble methods such as Bagging and Boosting, our approach maintains interpretability while offering competitive performance. Furthermore, we systematically employ maximum likelihood and cost functions to facilitate the analytical derivation of recursive gradients as functions of ensemble depth. The effectiveness of the proposed approach is validated on a custom dataset created by introducing noise and shifting data to simulate group structures, resulting in significant performance improvements with layers. Implemented in Python, this work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks with broad implications for machine learning applications. Code at https://github.com/ensemble-art/Dynamic-Logistic-Ensembles
