Table of Contents
Fetching ...

The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications

Gaëtan Caillaut, Raheel Qader, Jingshu Liu, Mariam Nakhlé, Arezki Sadoune, Massinissa Ahmim, Jean-Gabriel Barthelemy

TL;DR

This work addresses the limits of generalist LLMs in finance by building the LLM Pro Finance Suite, a set of five instruction-tuned models from $8B$ to $70B$ that are trained on a multilingual, finance-heavy corpus with more than $50\%$ finance data. The authors augment generalist capabilities with domain-specific data collected via CPT and SFT pipelines and augment multilinguality with translation data, aiming to preserve broad task performance while elevating financial reasoning, translation, and advisory capabilities. They introduce a comprehensive finance-focused benchmark suite, evaluate across general and finance tasks, and publicly release two $8B$ models to enable further research. Their results show consistent gains on financial tasks and translation without sacrificing general language abilities, while also examining RAG and toxicity with a candid discussion of evaluation limitations. The work thus provides a practical, open-source path toward robust, multilingual financial NLP with potential for retrieval-augmented and agentic finance workflows in real-world applications.

Abstract

The financial industry's growing demand for advanced natural language processing (NLP) capabilities has highlighted the limitations of generalist large language models (LLMs) in handling domain-specific financial tasks. To address this gap, we introduce the LLM Pro Finance Suite, a collection of five instruction-tuned LLMs (ranging from 8B to 70B parameters) specifically designed for financial applications. Our approach focuses on enhancing generalist instruction-tuned models, leveraging their existing strengths in instruction following, reasoning, and toxicity control, while fine-tuning them on a curated, high-quality financial corpus comprising over 50% finance-related data in English, French, and German. We evaluate the LLM Pro Finance Suite on a comprehensive financial benchmark suite, demonstrating consistent improvement over state-of-the-art baselines in finance-oriented tasks and financial translation. Notably, our models maintain the strong general-domain capabilities of their base models, ensuring reliable performance across non-specialized tasks. This dual proficiency, enhanced financial expertise without compromise on general abilities, makes the LLM Pro Finance Suite an ideal drop-in replacement for existing LLMs in financial workflows, offering improved domain-specific performance while preserving overall versatility. We publicly release two 8B-parameters models to foster future research and development in financial NLP applications: https://huggingface.co/collections/DragonLLM/llm-open-finance.

The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications

TL;DR

This work addresses the limits of generalist LLMs in finance by building the LLM Pro Finance Suite, a set of five instruction-tuned models from to that are trained on a multilingual, finance-heavy corpus with more than finance data. The authors augment generalist capabilities with domain-specific data collected via CPT and SFT pipelines and augment multilinguality with translation data, aiming to preserve broad task performance while elevating financial reasoning, translation, and advisory capabilities. They introduce a comprehensive finance-focused benchmark suite, evaluate across general and finance tasks, and publicly release two models to enable further research. Their results show consistent gains on financial tasks and translation without sacrificing general language abilities, while also examining RAG and toxicity with a candid discussion of evaluation limitations. The work thus provides a practical, open-source path toward robust, multilingual financial NLP with potential for retrieval-augmented and agentic finance workflows in real-world applications.

Abstract

The financial industry's growing demand for advanced natural language processing (NLP) capabilities has highlighted the limitations of generalist large language models (LLMs) in handling domain-specific financial tasks. To address this gap, we introduce the LLM Pro Finance Suite, a collection of five instruction-tuned LLMs (ranging from 8B to 70B parameters) specifically designed for financial applications. Our approach focuses on enhancing generalist instruction-tuned models, leveraging their existing strengths in instruction following, reasoning, and toxicity control, while fine-tuning them on a curated, high-quality financial corpus comprising over 50% finance-related data in English, French, and German. We evaluate the LLM Pro Finance Suite on a comprehensive financial benchmark suite, demonstrating consistent improvement over state-of-the-art baselines in finance-oriented tasks and financial translation. Notably, our models maintain the strong general-domain capabilities of their base models, ensuring reliable performance across non-specialized tasks. This dual proficiency, enhanced financial expertise without compromise on general abilities, makes the LLM Pro Finance Suite an ideal drop-in replacement for existing LLMs in financial workflows, offering improved domain-specific performance while preserving overall versatility. We publicly release two 8B-parameters models to foster future research and development in financial NLP applications: https://huggingface.co/collections/DragonLLM/llm-open-finance.

Paper Structure

This paper contains 28 sections, 4 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Relative improvement (in %) of a subset of the LLM Pro Finance models over their corresponding baseline on the financial translation task.
  • Figure 2: BLEU scores of a subset of the LLM Pro Finance on the financial translation task.