Table of Contents
Fetching ...

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer Chava, Natraj Raman, Charese Smiley, Jiaao Chen, Diyi Yang

TL;DR

Finance language differs from general text, and prior work often underutilized domain data. The authors present FLANG-BERT and FLANG-ELECTRA, trained with finance vocabulary/phrase masking and a span boundary objective, achieving strong performance across downstream tasks. They also introduce FLUE, a GLUE-like benchmark suite across five financial NLP tasks to standardize evaluation. Empirical results demonstrate state-of-the-art results on sentiment, headlines, NER, structure, and QA tasks, with ablations showing the value of domain-specific masking and SBO, and the approach generalizable to other domains.

Abstract

Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without completely leveraging the richness of the financial data. We propose a novel domain specific Financial LANGuage model (FLANG) which uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective. Additionally, the evaluation benchmarks in the field have been limited. To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain. These include new benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. Experiments on these benchmarks suggest that our model outperforms those in prior literature on a variety of NLP tasks. Our models, code and benchmark data are publicly available on Github and Huggingface.

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

TL;DR

Finance language differs from general text, and prior work often underutilized domain data. The authors present FLANG-BERT and FLANG-ELECTRA, trained with finance vocabulary/phrase masking and a span boundary objective, achieving strong performance across downstream tasks. They also introduce FLUE, a GLUE-like benchmark suite across five financial NLP tasks to standardize evaluation. Empirical results demonstrate state-of-the-art results on sentiment, headlines, NER, structure, and QA tasks, with ablations showing the value of domain-specific masking and SBO, and the approach generalizable to other domains.

Abstract

Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without completely leveraging the richness of the financial data. We propose a novel domain specific Financial LANGuage model (FLANG) which uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective. Additionally, the evaluation benchmarks in the field have been limited. To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain. These include new benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. Experiments on these benchmarks suggest that our model outperforms those in prior literature on a variety of NLP tasks. Our models, code and benchmark data are publicly available on Github and Huggingface.
Paper Structure (46 sections, 12 equations, 1 figure, 13 tables)

This paper contains 46 sections, 12 equations, 1 figure, 13 tables.

Figures (1)

  • Figure 1: Architecture of our model. We use finance specific datasets and general English datasets (Wikpedia and BooksCorpus) for training the model. We follow the training strategy of ELECTRA electra with span boundary task which first predicts masked tokens using language model and then uses a discriminator to assess if a token is original or replaced. The generator and discriminator are trained end-to-end, and both words and phrases from financial vocabulary are used for masking. The final discriminator is then fine-tuned on individual tasks on our contributed benchmark suite, Financial Language Understanding Evaluation (FLUE). Note that our method is not specific to ELECTRA and can be generalized to other models.