Generative AI Enhanced Financial Risk Management Information Retrieval
Amin Haeri, Jonathan Vitrano, Mahdi Ghelichi
TL;DR
This work tackles the challenge of extracting regulatory risk insights for financial risk management by developing a domain-specific QA dataset (RiskData) and a finetuned embedding model (RiskEmbed) within a Retrieval-Augmented Generation framework. Leveraging OSFI guidelines, the authors generate thousands of positive QA pairs and demonstrate substantial retrieval improvements over baselines, including domain adaptation that yields superior ranking metrics with a compact 768-dim embedding. The results show that RiskEmbed outperforms general-purpose and finance-specific embeddings in risk-management QA tasks, and the resources are open-sourced to accelerate industry and research adoption. The study also outlines future enhancements such as enhanced negative mining, vocabulary expansion, and broader regulatory coverage to generalize across financial systems.
Abstract
Risk management in finance involves recognizing, evaluating, and addressing financial risks to maintain stability and ensure regulatory compliance. Extracting relevant insights from extensive regulatory documents is a complex challenge requiring advanced retrieval and language models. This paper introduces RiskData, a dataset specifically curated for finetuning embedding models in risk management, and RiskEmbed, a finetuned embedding model designed to improve retrieval accuracy in financial question-answering systems. The dataset is derived from 94 regulatory guidelines published by the Office of the Superintendent of Financial Institutions (OSFI) from 1991 to 2024. We finetune a state-of-the-art sentence BERT embedding model to enhance domain-specific retrieval performance typically for Retrieval-Augmented Generation (RAG) systems. Experimental results demonstrate that RiskEmbed significantly outperforms general-purpose and financial embedding models, achieving substantial improvements in ranking metrics. By open-sourcing both the dataset and the model, we provide a valuable resource for financial institutions and researchers aiming to develop more accurate and efficient risk management AI solutions.
