Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions
Gustavo Polleti, Marlesson Santana, Eduardo Fontes
TL;DR
The paper presents a multimodal foundational model for financial transactions that unifies structured attributes and unstructured descriptions by encoding each transaction as a sentence and the account history as a document. A BERT-style MLM is fine-tuned on a private North American dataset (∼10 million accounts) to produce contextual embeddings, with the [CLS] token serving as the account representation. Evaluated on 19 downstream tasks spanning demographics, risk, banking, and geolocation, the approach outperforms handcrafted features and discrete-event baselines, particularly in data-scarce Open Banking scenarios, and demonstrates cross-institution and cross-geography generalization. The findings highlight the practical potential for fraud prevention, credit risk assessment, and customer insights using self-supervised multimodal representations in Open Banking contexts.
Abstract
We introduced a multimodal foundational model for financial transactions that integrates both structured attributes and unstructured textual descriptions into a unified representation. By adapting masked language modeling to transaction sequences, we demonstrated that our approach not only outperforms classical feature engineering and discrete event sequence methods but is also particularly effective in data-scarce Open Banking scenarios. To our knowledge, this is the first large-scale study across thousands of financial institutions in North America, providing evidence that multimodal representations can generalize across geographies and institutions. These results highlight the potential of self-supervised models to advance financial applications ranging from fraud prevention and credit risk to customer insights
