Table of Contents
Fetching ...

Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Modal Attention

Yujing Liu, Chen Yang

TL;DR

The paper addresses financial sentiment analysis by jointly modeling two textual opinion streams—recency and popularity—through a novel cross-modal attention mechanism. It introduces Financial Multi-Head Cross-Attention (FMHCA) and Multimodal Factorized Bilinear Pooling to fuse BERT-based embeddings across modalities, followed by transformer refinement and classification. On a large dataset of 837 companies, the approach achieves 83.5% accuracy, significantly outperforming single-modality baselines and demonstrating the critical role of cross-modal exchange and fusion. The work shows robust performance across different BERT variants and highlights practical implications for real-time market monitoring and risk management, while outlining directions for multilingual extension and enhanced temporal modeling.

Abstract

In recent years, financial sentiment analysis of public opinion has become increasingly important for market forecasting and risk assessment. However, existing methods often struggle to effectively integrate diverse opinion modalities and capture fine-grained interactions across them. This paper proposes an end-to-end deep learning framework that integrates two distinct modalities of financial opinions: recency modality (timely opinions) and popularity modality (trending opinions), through a novel cross-modal attention mechanism specifically designed for financial sentiment analysis. While both modalities consist of textual data, they represent fundamentally different information channels: recency-driven market updates versus popularity-driven collective sentiment. Our model first uses BERT (Chinese-wwm-ext) for feature embedding and then employs our proposed Financial Multi-Head Cross-Attention (FMHCA) structure to facilitate information exchange between these distinct opinion modalities. The processed features are optimized through a transformer layer and fused using multimodal factored bilinear pooling for classification into negative, neutral, and positive sentiment. Extensive experiments on a comprehensive dataset covering 837 companies demonstrate that our approach achieves an accuracy of 83.5%, significantly outperforming baselines including BERT+Transformer by 21 percent. These results highlight the potential of our framework to support more accurate financial decision-making and risk management.

Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Modal Attention

TL;DR

The paper addresses financial sentiment analysis by jointly modeling two textual opinion streams—recency and popularity—through a novel cross-modal attention mechanism. It introduces Financial Multi-Head Cross-Attention (FMHCA) and Multimodal Factorized Bilinear Pooling to fuse BERT-based embeddings across modalities, followed by transformer refinement and classification. On a large dataset of 837 companies, the approach achieves 83.5% accuracy, significantly outperforming single-modality baselines and demonstrating the critical role of cross-modal exchange and fusion. The work shows robust performance across different BERT variants and highlights practical implications for real-time market monitoring and risk management, while outlining directions for multilingual extension and enhanced temporal modeling.

Abstract

In recent years, financial sentiment analysis of public opinion has become increasingly important for market forecasting and risk assessment. However, existing methods often struggle to effectively integrate diverse opinion modalities and capture fine-grained interactions across them. This paper proposes an end-to-end deep learning framework that integrates two distinct modalities of financial opinions: recency modality (timely opinions) and popularity modality (trending opinions), through a novel cross-modal attention mechanism specifically designed for financial sentiment analysis. While both modalities consist of textual data, they represent fundamentally different information channels: recency-driven market updates versus popularity-driven collective sentiment. Our model first uses BERT (Chinese-wwm-ext) for feature embedding and then employs our proposed Financial Multi-Head Cross-Attention (FMHCA) structure to facilitate information exchange between these distinct opinion modalities. The processed features are optimized through a transformer layer and fused using multimodal factored bilinear pooling for classification into negative, neutral, and positive sentiment. Extensive experiments on a comprehensive dataset covering 837 companies demonstrate that our approach achieves an accuracy of 83.5%, significantly outperforming baselines including BERT+Transformer by 21 percent. These results highlight the potential of our framework to support more accurate financial decision-making and risk management.

Paper Structure

This paper contains 20 sections, 15 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of the proposed cross-modal attention enhanced financial sentiment analysis architecture. The model processes timely and trending opinions through BERT embedding, applies cross-modal attention mechanisms (MHA), processes through transformer layers, and fuses representations using MFB before final classification.