Ensemble Methodology:Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble
Mengran Zhu, Ye Zhang, Yulu Gong, Kaijuan Xing, Xu Yan, Jintong Song
TL;DR
The paper addresses the challenge of accurate credit default prediction in large-scale, time-series–like consumer data. It proposes a three-module Ensemble Methodology combining LightGBM, XGBoost, and LocalEnsemble, with distinct feature sets and out-of-fold meta-features to boost diversity and generalization. The approach is validated on the American Express dataset, where the Ensemble Model achieves the best public and private performance (M metric) compared to a range of baselines, highlighting improvements in both discriminatory power and recall at critical thresholds. This framework offers a robust, scalable solution for risk assessment in lending and sets a new practical benchmark for credit default prediction models.
Abstract
In the realm of consumer lending, accurate credit default prediction stands as a critical element in risk mitigation and lending decision optimization. Extensive research has sought continuous improvement in existing models to enhance customer experiences and ensure the sound economic functioning of lending institutions. This study responds to the evolving landscape of credit default prediction, challenging conventional models and introducing innovative approaches. By building upon foundational research and recent innovations, our work aims to redefine the standards of accuracy in credit default prediction, setting a new benchmark for the industry. To overcome these challenges, we present an Ensemble Methods framework comprising LightGBM, XGBoost, and LocalEnsemble modules, each making unique contributions to amplify diversity and improve generalization. By utilizing distinct feature sets, our methodology directly tackles limitations identified in previous studies, with the overarching goal of establishing a novel standard for credit default prediction accuracy. Our experimental findings validate the effectiveness of the ensemble model on the dataset, signifying substantial contributions to the field. This innovative approach not only addresses existing obstacles but also sets a precedent for advancing the accuracy and robustness of credit default prediction models.
