Table of Contents
Fetching ...

Are Foundation Models Useful for Bankruptcy Prediction?

Marcin Kostrzewa, Oleksii Furman, Roman Furman, Sebastian Tomczak, Maciej Zięba

TL;DR

This work systematically evaluates foundation models for corporate bankruptcy prediction on large-scale, structured financial data from the V4 region. By comparing Llama-3.3-70B-Instruct and TabPFN against traditional baselines (e.g., XGBoost, CatBoost) across horizons from $h=0$ to $h=4$, the study finds that classical gradient-boosting methods outperform foundation models in both ROC-AUC and $F_1$-score, while LLMs yield poorly calibrated probability estimates. TabPFN, though sometimes competitive in $F_1$-scores, does not consistently surpass simpler baselines and incurs substantial computational overhead. The results imply that current general-purpose foundation models are not yet suitable replacements for specialized ML methods in bankruptcy forecasting, and future work should investigate improved reasoning LLMs, accessible weights/logits, and hybrid multimodal approaches to leverage textual and numerical data. Overall, the paper provides important guidance for practitioners choosing predictive models for risk-sensitive bankruptcy forecasting and outlines clear directions for improving foundation-model-based approaches in finance.

Abstract

Foundation models have shown promise across various financial applications, yet their effectiveness for corporate bankruptcy prediction remains systematically unevaluated against established methods. We study bankruptcy forecasting using Llama-3.3-70B-Instruct and TabPFN, evaluated on large, highly imbalanced datasets of over one million company records from the Visegrád Group. We provide the first systematic comparison of foundation models against classical machine learning baselines for this task. Our results show that models such as XGBoost and CatBoost consistently outperform foundation models across all prediction horizons. LLM-based approaches suffer from unreliable probability estimates, undermining their use in risk-sensitive financial settings. TabPFN, while competitive with simpler baselines, requires substantial computational resources with costs not justified by performance gains. These findings suggest that, despite their generality, current foundation models remain less effective than specialized methods for bankruptcy forecasting.

Are Foundation Models Useful for Bankruptcy Prediction?

TL;DR

This work systematically evaluates foundation models for corporate bankruptcy prediction on large-scale, structured financial data from the V4 region. By comparing Llama-3.3-70B-Instruct and TabPFN against traditional baselines (e.g., XGBoost, CatBoost) across horizons from to , the study finds that classical gradient-boosting methods outperform foundation models in both ROC-AUC and -score, while LLMs yield poorly calibrated probability estimates. TabPFN, though sometimes competitive in -scores, does not consistently surpass simpler baselines and incurs substantial computational overhead. The results imply that current general-purpose foundation models are not yet suitable replacements for specialized ML methods in bankruptcy forecasting, and future work should investigate improved reasoning LLMs, accessible weights/logits, and hybrid multimodal approaches to leverage textual and numerical data. Overall, the paper provides important guidance for practitioners choosing predictive models for risk-sensitive bankruptcy forecasting and outlines clear directions for improving foundation-model-based approaches in finance.

Abstract

Foundation models have shown promise across various financial applications, yet their effectiveness for corporate bankruptcy prediction remains systematically unevaluated against established methods. We study bankruptcy forecasting using Llama-3.3-70B-Instruct and TabPFN, evaluated on large, highly imbalanced datasets of over one million company records from the Visegrád Group. We provide the first systematic comparison of foundation models against classical machine learning baselines for this task. Our results show that models such as XGBoost and CatBoost consistently outperform foundation models across all prediction horizons. LLM-based approaches suffer from unreliable probability estimates, undermining their use in risk-sensitive financial settings. TabPFN, while competitive with simpler baselines, requires substantial computational resources with costs not justified by performance gains. These findings suggest that, despite their generality, current foundation models remain less effective than specialized methods for bankruptcy forecasting.

Paper Structure

This paper contains 34 sections, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Distribution of self-reported classification output probabilities for Llama-3.3 in both zero-shot and in-context learning variants across all datasets.