A Large Language Model for Corporate Credit Scoring
Chitro Majumdar, Sergio Scandizzo, Ratanlal Mahanta, Avradip Mandal, Swarnendu Bhattacharjee
TL;DR
Omega^2 addresses the need for accurate, interpretable corporate credit scoring that remains robust over time and across rating agencies. It combines language-based reasoning with gradient-boosting models through a Resource Augmented Generation framework anchored by a vector store and a knowledge graph, optimized under temporal validation. The approach achieves mean test AUCs above $0.93$ across Moody’s, S&P, Fitch, and Egan-Jones and delivers interpretable feature attributions via SHAP, with regression $R^2$ around $0.60$ on unseen data. The study provides a reproducible, open-source blueprint for institution-grade AI in credit risk that generalizes across agencies and time.
Abstract
We introduce Omega^2, a Large Language Model-driven framework for corporate credit scoring that combines structured financial data with advanced machine learning to improve predictive reliability and interpretability. Our study evaluates Omega^2 on a multi-agency dataset of 7,800 corporate credit ratings drawn from Moody's, Standard & Poor's, Fitch, and Egan-Jones, each containing detailed firm-level financial indicators such as leverage, profitability, and liquidity ratios. The system integrates CatBoost, LightGBM, and XGBoost models optimized through Bayesian search under temporal validation to ensure forward-looking and reproducible results. Omega^2 achieved a mean test AUC above 0.93 across agencies, confirming its ability to generalize across rating systems and maintain temporal consistency. These results show that combining language-based reasoning with quantitative learning creates a transparent and institution-grade foundation for reliable corporate credit-risk assessment.
