Table of Contents
Fetching ...

Modal-adaptive Knowledge-enhanced Graph-based Financial Prediction from Monetary Policy Conference Calls with LLM

Kun Ouyang, Yi Liu, Shicheng Li, Ruihan Bao, Keiko Harimoto, Xu Sun

TL;DR

We address the challenge of predicting asset price movement and volatility from Monetary Policy Conference calls using multimodal data. We introduce MANAGER, a modal-adaptive, knowledge-enhanced graph framework that fuses text, external knowledge from FinDKG, video features from BEiT-3, and audio features from HuBERT, with an LLM backbone for task-specific prediction across six assets. Key contributions include integrating dynamic external knowledge, constructing a knowledge-enhanced cross-modal graph, and jointly predicting across assets with inter-asset relations, validated on the Monopoly dataset with comprehensive ablations and case studies. The results demonstrate that knowledge-guided, modality-aware fusion significantly improves predictive accuracy and volatility estimates, highlighting the practical value of external-domain knowledge in multimodal financial forecasting.

Abstract

Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial knowledge, the varying contributions of different modalities to financial prediction, as well as the innate relations among different financial assets. To tackle these limitations, we propose a novel Modal-Adaptive kNowledge-enhAnced Graph-basEd financial pRediction scheme, named MANAGER. Specifically, MANAGER resorts to FinDKG to obtain the external related knowledge for the input text. Meanwhile, MANAGER adopts BEiT-3 and Hidden-unit BERT (HuBERT) to extract the video and audio features, respectively. Thereafter, MANAGER introduces a novel knowledge-enhanced cross-modal graph that fully characterizes the semantic relations among text, external knowledge, video and audio, to adaptively utilize the information in different modalities, with ChatGLM2 as the backbone. Extensive experiments on a publicly available dataset Monopoly verify the superiority of our model over cutting-edge methods.

Modal-adaptive Knowledge-enhanced Graph-based Financial Prediction from Monetary Policy Conference Calls with LLM

TL;DR

We address the challenge of predicting asset price movement and volatility from Monetary Policy Conference calls using multimodal data. We introduce MANAGER, a modal-adaptive, knowledge-enhanced graph framework that fuses text, external knowledge from FinDKG, video features from BEiT-3, and audio features from HuBERT, with an LLM backbone for task-specific prediction across six assets. Key contributions include integrating dynamic external knowledge, constructing a knowledge-enhanced cross-modal graph, and jointly predicting across assets with inter-asset relations, validated on the Monopoly dataset with comprehensive ablations and case studies. The results demonstrate that knowledge-guided, modality-aware fusion significantly improves predictive accuracy and volatility estimates, highlighting the practical value of external-domain knowledge in multimodal financial forecasting.

Abstract

Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial knowledge, the varying contributions of different modalities to financial prediction, as well as the innate relations among different financial assets. To tackle these limitations, we propose a novel Modal-Adaptive kNowledge-enhAnced Graph-basEd financial pRediction scheme, named MANAGER. Specifically, MANAGER resorts to FinDKG to obtain the external related knowledge for the input text. Meanwhile, MANAGER adopts BEiT-3 and Hidden-unit BERT (HuBERT) to extract the video and audio features, respectively. Thereafter, MANAGER introduces a novel knowledge-enhanced cross-modal graph that fully characterizes the semantic relations among text, external knowledge, video and audio, to adaptively utilize the information in different modalities, with ChatGLM2 as the backbone. Extensive experiments on a publicly available dataset Monopoly verify the superiority of our model over cutting-edge methods.
Paper Structure (26 sections, 7 equations, 3 figures, 4 tables)

This paper contains 26 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An example of the financial prediction from MPC calls. We also present the external knowledge inferred by FinDKG for the given text. Notably, the words in blue are the anchor entities while those in green are the relations and those in red are the related entities.
  • Figure 2: The architecture of MANAGER, which consists of four key components including External Financial Knowledge Acquisition, Video-audio Feature Extraction, Knowledge-enhanced Modal-adaptive Context Comprehension and Task-specific Instruction Tuning for Financial Prediction.
  • Figure 3: Comparison between the results predicted by MANAAGER and the best baseline MPCNet on one testing sample.