Binary AddiVortes: (Bayesian) Additive Voronoi Tessellations for Binary Classification with an application to Predicting Home Mortgage Application Outcomes
Adam J. Stone, Emmanuel Ogundimu, John Paul Gosling
TL;DR
This work extends the AddiVortes framework to binary classification by embedding a probit latent-variable model within a sum-of-tessellations design, enabling probabilistic predictions and uncertainty quantification for binary outcomes. Through data augmentation and Bayesian backfitting, the method captures complex, local covariate interactions via multiple Voronoi tessellations while applying regularization to prevent overfitting. Empirical results on benchmark binary datasets and a mortgage-approval application show AddiVortes frequently achieves superior AUC and competitive accuracy relative to RF, BART, and XGBoost, with notable interpretability through variable inclusion and posterior intervals. The mortgage analysis demonstrates practical impact for financial decision-making, combining strong predictive performance with transparent, region-specific influence of covariates, and the approach is positioned for extensions to multinomial and time-to-event contexts.
Abstract
The Additive Voronoi Tessellations (AddiVortes) model is a multivariate regression model that uses multiple Voronoi tessellations to partition the covariate space for an additive ensemble model. In this paper, the AddiVortes framework is extended to binary classification by incorporating a probit model with a latent variable formulation. Specifically, we utilise a data augmentation technique, where a latent variable is introduced and the binary response is determined via thresholding. In most cases, the AddiVortes model outperforms random forests, BART and other leading black-box regression models when compared using a range of metrics. A comprehensive analysis is conducted using AddiVortes to predict an individual's likelihood of being approved for a home mortgage, based on a range of covariates. This evaluation highlights the model's effectiveness in capturing complex relationships within the data and its potential for improving decision-making in mortgage approval processes.
