Table of Contents
Fetching ...

Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification

Mohammad Zubair Khan, David Li

TL;DR

This paper tackles binary classification in domains where interpretability is essential and data may contain latent internal groupings. It introduces dynamic logistic ensembles that automatically partition data and compute recursive probabilities across multiple layers, preserving interpretability while enhancing predictive power. Key contributions include a novel recursive probability framework, analytic gradient derivations for n-layer ensembles via maximum likelihood, a data augmentation strategy to simulate internal structures, and an open-source Python implementation. The approach demonstrates improved performance over a baseline logistic regression on a wine-quality dataset, with 2–3 layer ensembles offering the best balance between accuracy and generalization, and provides a foundation for scalable, interpretable ensemble methods in practical applications.

Abstract

This paper presents a novel approach to binary classification using dynamic logistic ensemble models. The proposed method addresses the challenges posed by datasets containing inherent internal clusters that lack explicit feature-based separations. By extending traditional logistic regression, we develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. A key innovation in this work is the recursive probability calculation, derived through algebraic manipulation and mathematical induction, which enables scalable and efficient model construction. Compared to traditional ensemble methods such as Bagging and Boosting, our approach maintains interpretability while offering competitive performance. Furthermore, we systematically employ maximum likelihood and cost functions to facilitate the analytical derivation of recursive gradients as functions of ensemble depth. The effectiveness of the proposed approach is validated on a custom dataset created by introducing noise and shifting data to simulate group structures, resulting in significant performance improvements with layers. Implemented in Python, this work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks with broad implications for machine learning applications. Code at https://github.com/ensemble-art/Dynamic-Logistic-Ensembles

Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification

TL;DR

This paper tackles binary classification in domains where interpretability is essential and data may contain latent internal groupings. It introduces dynamic logistic ensembles that automatically partition data and compute recursive probabilities across multiple layers, preserving interpretability while enhancing predictive power. Key contributions include a novel recursive probability framework, analytic gradient derivations for n-layer ensembles via maximum likelihood, a data augmentation strategy to simulate internal structures, and an open-source Python implementation. The approach demonstrates improved performance over a baseline logistic regression on a wine-quality dataset, with 2–3 layer ensembles offering the best balance between accuracy and generalization, and provides a foundation for scalable, interpretable ensemble methods in practical applications.

Abstract

This paper presents a novel approach to binary classification using dynamic logistic ensemble models. The proposed method addresses the challenges posed by datasets containing inherent internal clusters that lack explicit feature-based separations. By extending traditional logistic regression, we develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. A key innovation in this work is the recursive probability calculation, derived through algebraic manipulation and mathematical induction, which enables scalable and efficient model construction. Compared to traditional ensemble methods such as Bagging and Boosting, our approach maintains interpretability while offering competitive performance. Furthermore, we systematically employ maximum likelihood and cost functions to facilitate the analytical derivation of recursive gradients as functions of ensemble depth. The effectiveness of the proposed approach is validated on a custom dataset created by introducing noise and shifting data to simulate group structures, resulting in significant performance improvements with layers. Implemented in Python, this work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks with broad implications for machine learning applications. Code at https://github.com/ensemble-art/Dynamic-Logistic-Ensembles

Paper Structure

This paper contains 33 sections, 27 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of clusters and decision boundaries. Cluster A (blue circle) and Cluster B (orange circle) have identical true decision boundaries (green sine curves). A single model's decision boundary (red dashed curve) attempts to fit both clusters but fails to accurately classify the data due to inherent limitations, even when using complex boundaries.
  • Figure 2: Illustration of recursive probability calculations in the dynamic logistic ensemble model across multiple layers. Each node $h_j(x)$ represents a logistic regression model. The formulas on the right demonstrate how the probabilities are recursively expanded at each layer, starting from the root node and incorporating the outputs of the child nodes to compute the final probability $P(1|x)$.
  • Figure 3: Logistic ensemble tree, where $n$ is the layer index.
  • Figure 4: Cost function convergence of the baseline logistic regression model.
  • Figure 5: Cost function convergence of the 1-layer, 2-layer, 3-layer, and 4-layer ensemble models. Ordered left to right.
  • ...and 1 more figures