Table of Contents
Fetching ...

Cross-Modal Temporal Fusion for Financial Market Forecasting

Yunhua Pei, John Cartlidge, Anandadeep Mandal, Daniel Gold, Enrique Marcilio, Riccardo Mazzon

TL;DR

Cross-Modal Temporal Fusion (CMTF) introduces a transformer-based framework that unifies structured market data with unstructured news and financial reports through a tensor representation and encoding scheme. The approach includes a sparsity-driven tensor interpretation module and an auto-training pipeline (Optuna-based) to rapidly converge on effective hyperparameters. Empirical results on FTSE 100 data show CMTF outperforms classical and deep baselines in next-day price direction classification, with notable gains in recall and F1. The framework emphasizes interpretability and rapid adaptation to evolving market conditions, offering a scalable solution for real-world cross-modal financial forecasting.

Abstract

Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.

Cross-Modal Temporal Fusion for Financial Market Forecasting

TL;DR

Cross-Modal Temporal Fusion (CMTF) introduces a transformer-based framework that unifies structured market data with unstructured news and financial reports through a tensor representation and encoding scheme. The approach includes a sparsity-driven tensor interpretation module and an auto-training pipeline (Optuna-based) to rapidly converge on effective hyperparameters. Empirical results on FTSE 100 data show CMTF outperforms classical and deep baselines in next-day price direction classification, with notable gains in recall and F1. The framework emphasizes interpretability and rapid adaptation to evolving market conditions, offering a scalable solution for real-world cross-modal financial forecasting.

Abstract

Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.

Paper Structure

This paper contains 32 sections, 22 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of proposed CMTF. The framework integrates multimodal data (historical data, macro index, news, and financial reports). It employs Tensor Representation (extract tensor representation from unstructured data), Tensor Encoding (scale and preprocess the collected tensor), Tensor Interpretation (select important tensors), and a Transformer-based forecasting model (apply the optimal training scheme).
  • Figure 2: Tensor representation pipeline of financial reports and news.
  • Figure 3: Demonstration of how tensor interpretation accumulates feature values over time. Here, we assume that CMTF undergoes monthly training iterations; a higher label count indicates greater importance for that period.
  • Figure 4: Hyperparameter search strategies over loss landscape: grid search, random search, and tree-structured parzen estimator (TPE).