Table of Contents
Fetching ...

Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)

Maryan Rizinski, Hristijan Peshov, Kostadin Mishev, Milos Jovanovik, Dimitar Trajanov

TL;DR

The paper addresses the need for fast, interpretable sentiment analysis in finance by bridging lexicon-based methods and transformer models. It introduces eXplainable Lexicons (XLex), learned via transformer explanations (SHAP) to expand the Loughran-McDonald lexicon, and integrates XLex with LM to form XLex+LM. Empirical results across multiple finance datasets show XLex and XLex+LM outperform the LM baseline in accuracy, F1, and MCC, with substantial gains in vocabulary coverage and processing speed. The approach offers practical benefits for finance applications requiring transparency and real-time performance, while remaining adaptable to other domains and future improvements in optimization and parallelization.

Abstract

Lexicon-based sentiment analysis (SA) in finance leverages specialized, manually annotated lexicons created by human experts to extract sentiment from financial texts. Although lexicon-based methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various NLP tasks due to their remarkable performance. However, transformers require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon, reducing the human involvement in annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in SA of financial datasets. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the XLex approach is inherently more interpretable than transformer models as lexicon models rely on predefined rules, allowing for better insights into the results of SA and making the XLex approach a viable tool for financial decision-making.

Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)

TL;DR

The paper addresses the need for fast, interpretable sentiment analysis in finance by bridging lexicon-based methods and transformer models. It introduces eXplainable Lexicons (XLex), learned via transformer explanations (SHAP) to expand the Loughran-McDonald lexicon, and integrates XLex with LM to form XLex+LM. Empirical results across multiple finance datasets show XLex and XLex+LM outperform the LM baseline in accuracy, F1, and MCC, with substantial gains in vocabulary coverage and processing speed. The approach offers practical benefits for finance applications requiring transparency and real-time performance, while remaining adaptable to other domains and future improvements in optimization and parallelization.

Abstract

Lexicon-based sentiment analysis (SA) in finance leverages specialized, manually annotated lexicons created by human experts to extract sentiment from financial texts. Although lexicon-based methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various NLP tasks due to their remarkable performance. However, transformers require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon, reducing the human involvement in annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in SA of financial datasets. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the XLex approach is inherently more interpretable than transformer models as lexicon models rely on predefined rules, allowing for better insights into the results of SA and making the XLex approach a viable tool for financial decision-making.
Paper Structure (21 sections, 12 equations, 7 figures, 26 tables)

This paper contains 21 sections, 12 equations, 7 figures, 26 tables.

Figures (7)

  • Figure 1: Architecture of the data processing pipeline for generating the explainable lexicon (XLex). The upper section of the figure, labeled as "Extract positive and negative words using SHAP", illustrates the word extraction process using SHAP, followed by post-processing steps to generate separate positive and negative word datasets from the chosen source datasets. The lower section of the figure, referred to as "Create explainable lexicon", encompasses adding explainability features, handling duplicates, and merging the positive and negative datasets to form the comprehensive explainable lexicon XLex. The pipeline concludes by merging XLex with the Loughran-McDonald (LM) lexicon, resulting in the combined XLex+LM lexicon.
  • Figure 2: The explainable lexicon (XLex) and the LM lexicon are merged to form the combined XLex+LM lexicon. Before the merging process, the "Source" feature is introduced to both XLex and LM, and all features (excluding the "word" feature) are appropriately prefixed to enable identification of XLex features as well as LM features within the combined lexicon. Handling of missing values takes place subsequent to the merging.
  • Figure 3: The LM lexicon undergoes a preparatory adjustment process to enable its seamless integration with the explainable XLex lexicon, resulting in the formation of the combined XLex+LM lexicon. This adjustment process includes the extraction of positive and negative words, subsequent word processing, handling of duplicates, and the final step of merging the positive and negative word sets.
  • Figure 4: A list of explainability features based on SHAP added in the explainable and LM lexicons. For the LM lexicon, all features except "Category" are assigned the value of 1 as their default value.
  • Figure 5: The process of dealing with duplicate entries between the positive and negative words for each of the explainable (XLex) and LM lexicons. In the case of the LM lexicon, features designated as "Opposite" are assigned a default value of 0. The "Total Count" feature can be obtained by deriving it from the values of "Count (Selected)" and "Count (Opposite)".
  • ...and 2 more figures